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EXECUTIVE  SUMMARY 


The  research  conducted  under  contract  AFOSR  83-0278  is  reported  in  seven  tech¬ 
nical  reports  corresponding  to  Chapters  1  through  7  in  this  report.  A  brief 
description  of  each  study  follows: 


:  CHAPTER  1  USING  TWO  SEQUENCES  OF  PURE  NETWORK  .PROBLEMS 
TO  SOLVE  THE  MULTICOMMODITY  NETWORK' FLOW  PROBLEM, 

Summary:  This  paper  presents  a  new  algorithm  for  solving  large  multicommodity 
network  flow  problems.  The  work  was  motivated  by  the  Casualty 
Evacuation  Model  originally  developed  by  Lt .  Col.  Dennis  McLain. 
Captain  Robert  Chmielewski  continued  this  activity  and  eventually  a 
modification  of  this  model  was  solved  by  the  P.I.  and  Captain 
Chmielewski  on  a  CDC  Cyber  205.  All  of  this  activity  was  directed  by 
Mr.  Thomas  Kowalsky  of  DSC/Plans  of  MAC  Headquarters. 

Publication  Status:  This  work  has  not  been  submitted  for  publication. 

Background:  This  was  the  dissertation  research  of  P-.  Ellen  Allen. 


CHAPTER  2  NETWORKS  WITH  SIDE  CONSTRAINTS: 

AN  OJ  FACTORIZATION  UPDATE  ^ 

Summary:  An  important  class  of  mathematical  programming  models  which  are  fre¬ 
quently  used  in  logistics  studies  is  the  model  of  a  network  problem 
having  additional  linear  constraints.  A  specialization  of  the  primal 
simplex  algorithm  which  exploits  the  network  structure  can  be 
applied  to  this  problem  class.  This  specialization  maintains  the 
basis  as  a  rooted  spanning  tree  and  a  general  matrix  called  the 
working  basis.  This  paper  presents  the  algorithms  which  may  be  used 
to  maintain  the  inverse  of  this  working  basis  as  an  LU  factoriza¬ 
tion,  which  is  the  industry  standard  for  general  linear  programming 
software.  Our  specialized  code  exploits  not  only  the  network  struc¬ 
ture  but  also  the  sparsity  characteristics  of  the  working  basis. 
Computational  experimentation  indicates  that  our  LU  implementation 
results  in  a  50  percent  savings  in  the  non-zero  elements  in  the  eta 
file,  and  our  computer  codes  are  approximately  twice  as  fast  as  MINOS 
and  XMP  on  a  set  of  randomly  generated  multicommodity  network  flow 
problems . 

Publication  Status:  Published  in  The  Annals  of  the  Society  of  Logistics 

Engineers ,  1,  1,  (1986),  66-85. 

I  Background:  This  work  is  a  summary  of  the  dissertation  research  of  Dr. 
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*  £H  AFTER  3  THE  FREQUENCY  ASSIGNMENT  PROBLEM: 

A  SOLUTION  VIA  NONLINEAR  PROGRAMMING, 

Summary:  This  paper  gives  a  mathematical  programming  model  for  the  problem  of 
assigning  frequencies  to  nodes  in  a  communications  network.  The 
objective  is  to  select  a  frequency  assignment  which  minimizes  both 
cochannel  and  adjacent-channel  interference.  In  addition,  a  design 
engineer  has  the  option  to  designate  key  links  in  which  the  avoidance 
of  jamming  due  to  self  interference  is  given  a  higher  priority.  The 
model  has  a  nonconvex  quadratic  objective  function,  generalized 
upper-bounding  constraints,  and  binary  decision  variables.  We 
developed  a  special  heuristic  algorithm  and  software  for  this  model 
and  tested  it  on  five  test  problems  which  were  modifications  of  a 
real-world  problem.  Even  though  most  of  the  test  problems  had  over 
600  binary  variables,  we  were  able  to  obtain  a  near  optimum  in  less 
than  12  seconds  of  CPU  time  on  a  CDC  Cyber-875. 

Publication  Status:  Published  in  Naval  Research  Logistics,  34,  (1987),  1 33— 

139. 

Background:  This  was  our  first  application  in  the  communications  area. 


N  CHAPTER  4  A  GENERALIZATION  OF  POLYAK'S  CONVERGENCE  RESULT 
^  'VOR  SUBGRADIENT  OPTIMIZATION  ** 

©  Summary:  This  paper  generalizes  a  practical  convergence  result  first  presented 

by  Polyak.  This  new  result  presents  a  theoretical  justification  for 
the  step  size  which  has  been  successfully  used  in  several  specialized 
algorithms  which  incorporate  the  subgradient  optimization  approach. 

Publication  Status:  Published  in  Mathematical  Programming,  37,  3,  (1987)  309- 
•  318. 

Background:  The  convergence  theory  presented  in  this  paper  was 

motivated  by  the  good  computational  results  achieved  by 
Drs.  Ellen  Allen  and  Bala  Shetty  in  their  dissertations. 


CHAPTER  5  THE  EQUAL  FLOW  PROBLEM, 

Summary:  This  paper  presents  a  new  algorithm  for  the  solution  of  a  network 

problem  with  equal  flow  side  constraints.  The  solution  technique  is 
motivated  by  the  desire  to  exploit  the  special  structure  of  the  side 
constraints  and  to  maintain  as  much  of  the  characteristics  of  pure 
network  problems  as  possible.  The  proposed  algorithm  makes  use  of 
Lagrangean  relaxation  to  obtain  a  lower  bound  and  decomposition  by 
right-hand-side  allocation  to  obtain  upper  bounds.  The  Lagrangean 
dual  serves  not  only  to  provide  a  lower  bound  used  to  assist  in 
termination  criteria  for  the  upper  bound,  but  also  allows  an  initial 
allocation  of  equal  flows  for  the  upper  bound.  The  algorithm  has 
been  tested  on  problems  with  up  to  1500  nodes  and  5000  arcs. 
Computational  experience  indicates  that  solutions  whose  objective 
function  value  is  well  within  1^  of  the  optimum  can  be  obtained  in 

12-65%  of  the  MPSX  t  ime  depending  on  the  amount  of  imbalance  inherent 


in  the  problem.  Incumbent  integer  solutions  which  are  within  99.99% 
feasible  and  well  within  1%  of  the  proven  lower  bound  are  obtained  in 
a  straightforward  manner  requiring,  on  the  average,  30%  of  the  MPSX 
time  required  to  obtain  a  linear  optimum. 

Publication  Status:  This  paper  has  been  accepted  for  publication  in  the 

European  Journal  of  Operations  Research. 


Background: 


This  work  is  a  summary  of  the  dissertation  research  of  Dr, 
Bala  Shetty. 


CHAPTER  6  A  PARALLELIZATION  OF  THE  SIMPLEX  ALGORITHM 

Summary:  This  paper  presents  a  parallelization  of  the  simplex  method  for 

linear  programming.  Current  implementations  of  the  simplex  method  on 
sequential  computers  are  based  on  a  triangular  factorization  of  the 
inverse  of  the  current  basis.  An  alternative  decomposition  designed 
for  parallel  computation,  called  the  quadrant  interlocking  factoriza¬ 
tion,  has  previously  been  proposed  for  solving  linear  systems  of 
equations.  This  research  presents  the  theoretical  justification  and 
algorithms  required  to  implement  this  new  factorization  in  a  simplex- 
based  linear  programming  system. 

Publication  Status:  This  paper  has  been  submitted  for  publication  and  is 

currently  under  review. 


Background : 


This  paper  is  a  summary  of  the  dissertation  research  of 
Dr.  Hossam  Zaki. 


CHAPTER  7  MINIMAL  ^PANNING  IREES: 

A  COMPUTATIONAL  INVESTIGATION  OF  PARALLEL  ALGORITHMS.  - 

Summary:  The  objective  of  this  investigation  is  to  computationally  test 

parallel  algorithms  for  finding  minimal  spanning  trees.  Computa¬ 
tional  tests  were  run  on  a  single  processor  using  Prim's,  Kruskal's 
and  Boruvka's  algorithms.  Our  implementation  of  Prim's  algorithm  is 
superior  for  high  density  graphs,  while  our  implementation  of 
Boruvka's  algorithm  is  best  for  sparse  graphs.  Implementations  of 
parallel  versions  of  both  Prim's  and  Boruvka's  algorithms  were  tested 
on  a  twenty-cpu  Balance  21000.  For  the  environment  in  which  a  min¬ 
imum  spanning  tree  problem  is  a  subproblem  within  another  algorithm, 
the  parallel  implementation  of  both  Boruvka's  and  Prim's  algorithms 
produced  speedups  of  three  and  five  on  five  and  ten  processors, 
respectively.  The  one-time  overhead  for  process  creation  negates 
most,  if  not  all  of  the  benefits  for  solving  a  single  minimum 
spanning  tree  subproblem. 

Publication  Status:  This  paper  has  been  submitted  for  publication  and  is 

currently  under  review. 


Background : 


This  is  our  first  computational  investigation  which  has 
been  completed  since  the  parallel  computer  arrived  at 
Southern  Methodist  University, 


CHAPTER  1 


i 


USING  TWO  SEQUENCES  OF  PURE  NETWORK  PROBLEMS  TO  SOLVE 
THE  MULTICOMMODITY  NETWORK  FLOW  PROBLEM 


9 


O 


A  Dissertation  Presented  to  the  Graduate 
Faculty  of  the  School  of  Engineering 
and  Applied  Science 
of 

Southern  Methodist  University 


in 

Partial  Fulfillment  of  the  Requirements 
for  the  Degree  of 
Doctor  of  Philosophy 
with  major  in 
Operations  Research 
by 

Ellen  Parker  Allen 

B.A.S.,  Southern  Methodist  University,  1975 
M.S.O.R.,  Southern  Methodist  University,  1981 


May  1985 


TABLE  OF  CONTENTS 


ABSTRACT. 


LIST  OF  TABLES. 


ACKNOWLEDGEMENTS. 


CHAPTER  I.  INTRODUCTION . 

1.1  Notation  and  Conventions  . 

1.2  Problem  Definition  . 

1.3  The  Casualty  Evacuation  Model . 

1.4  Accomplishments  of  This  Investigation. 


CHAPTER  II. 


CHAPTER  III, 


A  SURVEY  OF  RELATED  LITERATURE  .  . 

2.1  Pure  Networks  . 

2.2  Multicommodity  Networks.  .  .  . 

2.2.1  Partitioning  Algorithms 

2.2.2  Decomposition  Algorithms 

2.3  Subgradient  Optimization  .  .  . 


THE  ALGORITHM  . 

3.1  Subgradient  Optimization 

3.2  Generating  Lower  Bounds. 

3.3  Generating  Upper  Bounds. 

3.4  The  Algorithm . 


CHAPTER  IV.  COMPUTATIONAL  EXPERIMENTATION  . 

4.1  Description  of  the  Computer  Programs 

4.1.1  MCNF . 

4.1.2  EVAC . 

4.2  Description  of  the  Test  Problems  .  . 

4.3  Summary  of  Computational  Results  .  . 

4.4  Analysis  of  Results . .  . 


CHAPTER  V.  SUMMARY  AND  CONCLUSIONS  . 

5.1  Summary  and  Conclusions.  .  .  . 

5.2  Areas  for  Future  Investigation 


LIST  OF  REFERENCES 


v 


LIST  or  TABLES 


Table 

4.1  Description  of  the  Test  Problems  and  Summary 
Comparison  of  Solution  Times  for  EVAC  and  MCNF 


4.2  Detailed  Timing  Statistics  for  EVAC  Runs 

4.3  Graphical  Comparison  of  EVAC  and  MCNf  Solution  Times 


.”1 


1 


* 


« 


CHAPTER  I 
INTRODUCTION 

This  dissertation  presents  a  new  technique  for  solving  very 
large  multicommodity  network  flow  problems.  The  specific  application 
which  motivated  this  work  originated  with  the  United  States  Air  force 
and  was  first  presented  to  us  by  It.  Col.  Dennis  McLain,  the  Assistant 
Director  of  Operations  Research  for  the  Military  Airlift  Command  at 
Scott  Air  force  Base.  The  problem  is  an  extremely  large  casualty 
evacuation  model  to  be  used  by  the  Air  force  in  forming  a  plan  for  the 
evacuation  of  wartime  casualties.  This  plan  would  be  implemented  in 
case  of  a  European  military  conflict  involving  United  States  troops. 

Lt.  Col.  McLain  was  the  first  to  model  this  problem  as  a  multi- 
commodity  network  flow  problem  where  the  commodities  correspond  to  the 
various  types  of  wounds.  The  nodes  represent  such  entities  as  European 
bases  and  United  States  medical  facilities,  and  the  arcs  represent 
specific  aircraft  flights.  (A  complete  description  of  this  problem  is 
given  in  Section  1.3.)  This  problem  is  far  too  large  to  be  solved  by 
any  known  existing  computer  codes.  In  addition,  since  many  of  the 
data  are  only  rough  estimates  (the  number  of  casualties  of  various 
types  expected  at  given  locations),  an  exact  technique  is  not  called 
for.  Instead  a  technique  is  needed  to  discover  a  guaranteed  e-optimum 
for  any  given  e  >0 . 


This  is  precisely  what  our  technique  accomplishes.  It  generates 


successively  better  upper  and  lower  bounds  on  the  optimum,  stopping 
when  the  two  bounds  are  within  a  prescribed  tolerance.  We  exploit 
the  multicommodity  network  structure  in  both  the  lower  and  upper  bound 
routines  so  that  only  a  single  commodity  minimum  cost  network  flow 
optimizer  is  needed.  EV AC,  the  computer  code  which  implements  our 
technique,  has  been  used  to  solve  a  series  of  test  problems  in  less 
time  and  requiring  less  memory  than  HCNf,  a  specialized  multi- 
commodity  network  flow  problem  solver.  In  addition  EVAC  is  capable  of 
solving  very  large  problems  which  MCNF  is  unable  to  solve. 


1.1  Notation  and  Conventions 


The  notational  conventions  employed  throughout  this  work  are 


described  in  this  section.  Matrices  are  denoted  by  upper  case  Latin 


.  th 


letters.  The  element  of  a  matrix,  A,  which  appears  in  the  i  row  and 
th 


j  column  is  indicated  by  A.^.  The  symbol  I  is  used  to  denote  an 
identity  matrix  with  dimension  appropriate  to  the  context.  Lower  case 
Latin  and  Greek  letters  are  used  to  denote  vectors.  The  symbol  0  is 
used  to  represent  a  vector  of  zeroes  with  dimension  appropriate  to 
the  context.  The  unit  vector,  whose  only  non-zero  component  is  a  one 


in  the  jth  position,  is  denoted  e..  Subscripts  are  used  to  indicate 

J 


individual  components  of  a  vector,  or  as  an  index  to  indicate  which  of 


a  sequence  of  related  vectors  is  meant.  Superscripts  on  vectors  corre 
spond  to  individual  commodities.  Note  that  vectors  are  considered  to 
be  row  vectors  or  column  vectors  as  appropriate  to  the  context;  that 
is,  no  special  notation  is  used  to  indicate  the  transpose  of  a  vector. 
The  inner  product  of  two  vectors,  x  and  y,  is  denoted  simply  by  xy. 

The  notation  ||x||  is  used  to  express  the  Euclidian  norm,  (xx) 

Scalars  are  written  as  lower  case  Greek  or  Latin  letters. 

Euclidean  n-dimensional  space  is  denoted  Rn.  Functions  are 
written  as  lower  case  Latin  letters,  and  functional  values  have  their 
arguments  in  parentheses.  For  example  g(y)  is  used  to  denote  the 
function  g  evaluated  at  the  point  y.  The  one  exception  to  this 
convention  is  the  projection  operation  described  in  Chapter  III.  In 
this  case  P[x]  is  used  to  express  the  projection  of  x  onto  the 
specified  region. 

Upper  case  Greek  letters  denote  sets,  with  the  exception  that 
eg(y)  is  used  to  denote  the  set  of  subgradients  of  a  function  g  at  a 
point  y  .n  the  domain  of  g.  The  symbol  z  is  used  as  the  set  inclusion 
symbol  and  as  a  termination  tolerance. 

We  use  MAX{S:  to  denote  the  largest  element  of  a  set  5; 
similarly  MIN{S]  indicates  the  smallest  element  of  S.  The  symbol  «  is 
used  for  infinity,  and  ■  denotes  the  end  of  a  proof.  All  other 
notation  is  standard. 
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1.2  Problem  Definition 

A  network  is  composed  of  two  entities:  nodes  and  arcs.  The  arcs 
may  be  viewed  as  undirectional  means  of  commodity  transport,  and  the 
nodes  may  be  thought  of  as  locations  or  terminals  connected  by  the 
arcs  and  served  by  whatever  physical  means  of  transport  are  associated 
with  the  arcs.  We  limit  our  consideration  to  networks  with  finite 
numbers  of  nodes  and  arcs.  For  a  given  network  we  denote  the  number 
of  nodes  by  m  and  the  number  of  arcs  by  n.  We  impose  an  ordering  on 
the  nodes  and  arcs  so  as  to  put  them  in  a  one-to-one  correspondence 
with  the  integers  {1,...,m}  and  {1,...,n},  respectively.  The  struc¬ 
ture  of  a  given  network  may  be  described,  then,  by  an  m  x  n  matrix 
called  a  node-arc  incidence  matrix.  Such  a  matrix,  A,  is  defined  in 
this  way; 

+1  ,  if  arc  j  is  directed  away  from  node  i 
-1,  if  arc  j  is  directed  toward  node  i 

0,  otherwise. 

Additionally,  for  a  multicommodity  network,  we  are  concerned  with  more 
than  one  type  of  item  (commodity)  flowing  through  the  arcs.  We  order 
these  commodities  to  correspond  to  the  integers  {1,...,K}. 

We  define  the  following  quantities  to  be  used  in  the  formulation 
of  the  multicommodity  network  flow  problem: 

—  A  is  the  m  x  n  node-arc  incidence  matrix  corresponding  to  the 

underlying  network, 
k 

—  x  is  an  n  vector  of  decision  variables  for  k  =  1,...K.  Note 

k 

that  x  .  represents  the  amount  of  flow  of  commodity  k  on  arc  j. 


c  is  an  n  vector  of  unit  costs  for  k  =  1 


»  «  •  M 


K.  So 


0^  denotes  the  cost  for  one  unit  of  flow  of  commodity  k  on  arc 

j- 

k  k 

—  r  is  an  m  vector  of  requirements  for  k  =  1 so  that  r^ 

denotes  the  required  number  of  units  of  commodity  k  at  node  i.  If 

k 

r^  <  0  then  node  i  is  said  to  be  a  demand  node  for  commodity  k 

i  k  i  k 

with  demand  =  r.  .  Ifr.  >0  then  node  i  is  said  to  be  a 
1  i 1  l 

k  k 

supply  node  for  commodity  k  with  supply  =  r^.  And  if  r^  =  0 
then  node  i  is  said  to  be  a  transshipment  node  for  commodity  k. 

—  u  is  an  n  vector  of  mutual  arc  capacities.  That  is,  the  total 

flow  of  all  the  commodities  combined  for  arc  i  cannot  exceed  u.. 

J  J 

--  v  is  a  n  vector  of  arc  capacities  for  commodity  k  (k  = 

v  .  ,  then,  represents  an  upper  bound  on  the  flow  of  commodity  k 
J 

on  arc  j. 

We  sometimes  refer  to  the  entire  vector  of  decision  variables 
1  K 

(x  ,...,x  )  as  simply  x.  Similarly  we  use  c,  r,  and  v  to  denote  the 
entire  vector  of  costs,  requirements  and  upper  bounds,  respectively. 

Using  these  ideas,  we  may  formulate  the  multicommodity  network 
flow  problem  for  a  given  network  with  m  nodes,  n  arcs,  ana  K  commodities 
as  follows: 

k  k 

Minimize  Lex 
k 

Subject  to  Axk  =  rk ,  k  =  (MP) 

I  xk  £  u 
k 

0  <  xk  <  vk,  k  =  1,...,K. 
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1.3  The  Casualty  Evacuation  Model 

A  large  European  military  conflict  involving  U.S.  Armed  Forces 
could  result  in  more  casualties  than  could  be  effectively  handled  in 
European  medical  facilities.  To  alleviate  this  overcrowding,  the 
Department  of  Defense  plans  to  implement  the  following  evacuation 
policy: 

"During  the  first  30  days  of  a  conflict,  if  a  wounded 
soldier  cannot  be  returned  to  duty  within  15  days,  then 
he  will  be  evacuated  to  a  medical  facility  in  the  United 
States.  In  the  next  30  days  the  limit  on  treatment  time 
is  increased  to  30  days." 

Given  a  scenario  concerning  such  a  conflict  (i.e.  the  number  and  loca¬ 
tions  of  wounded  and  the  types  of  wounds),  this  evacuation  problem  may 
be  modelled  as  a  multicommodity  network  flow  problem.  It.  Col.  Dennis 
McLain  was  the  first  to  model  the  problem  in  this  way.  In  Lt.  Col. 
McLain's  model  the  nodes  correspond  to  9  European  recovery  bases  and 
95  United  States  locations.  The  arcs  correspond  to  aircraft  flights 
connecting  European  and  U.S.  facilities.  The  commodities  are  11 
different  patient  types. 

In  order  to  enforce  a  capacity  on  a  given  facility,  it  is 


necessary  to  duplicate  the  corresponding  node  using  the  capacity  as  an 
upper  bound  on  the  arc  between  the  duplicate  nodes.  For  example,  if 
node  A  represents  a  hospital  with  300  beds,  then  we  substitute  two 


nodes,  A1  and  A2 ,  along  with  an  arc  whose  capacity  is  300.  Further, 
it  is  necessary  to  include  60  copies  of  the  entire  network,  one  for 
each  of  the  60  one-day  time  periods.  Additional  arcs  are  created  to 
link  each  time  period  to  the  next.  The  model  includes  a  dummy  "sink" 
node  for  each  time  period  and  one  "super  sink"  node,  along  with 
capacitated  arcs  to  allow  patients  who  have  recovered  to  exit  from  the 
system.  These  considerations  produce  a  very  large  model.  The 
dimensions  of  the  constraint  matrix  are  shown  below: 


rows 


where  A  =  ...  =  A,^.  The  row  dimension  of  this  model  is  over  137,000, 
which  is  far  beyond  the  scope  of  any  known  existing  computer  code.  To 
put  these  figures  in  perspective,  we  note  that  Kennington  reports  that 
the  largest  models  he  has  solved  using  his  primal  partitioning  code, 
MCNF,  have  been  on  the  order  of  3000  rows  [2]. 

Our  plan  has  been  to  develop  a  specialized  solution  procedure 
which  would  solve  a  scaled-down  version  of  Lt.  Col.  McLain’s  model.  We 
anticipate  aggregation  of  the  data,  possibly  using  some  of  the 
following  ideas: 


(1)  Aggregation  of  the  time  periods.  Note  that  simply  using 
3-day  time  periods  instead  of  1-dsy  time  periods  reduces  the 
problem  size  to  around  46,000  rows. 

(2)  Aggregation  of  similar  patient  types. 

(3)  Aggregation  of  U.S.  medical  facilities  so  that  facilities 
which  are  located  within  a  given  number  of  miles  of  one 
another  are  treated  as  one  node. 

At  the  writing  of  this  dissertation  we  have  not  yet  received  any 
large  test  problems  from  the  Air  Force.  As  a  result,  we  are  unable  to 
report  on  the  problem  size  limitations  of  our  technique.  However,  in 
an  attempt  to  test  our  software  on  a  relatively  large  problem,  we 
solved  a  randomly  generated  test  problem  with  around  9,000  rows.  (See 
Chapter  4  for  the  details  of  this  problem.)  This  is  the  largest 
problem  we  have  attempted  so  far. 

1.4  Accomplishments  of  This  Investigation 

This  dissertation  proposes  a  new  technique  for  solving  extremely 
large  multicommodity  network  flow  problems.  Our  method  involves 
generating  upper  bounds  on  the  optimal  objective  value  by  partially 
solving  the  problem  using  a  resource-directive  decomposition  technique, 
and  generating  lower  bounds  on  the  optimal  objective  value  by  partially 
solving  a  Lagrangian  dual  of  the  problem.  Both  the  upper 
and  lower  bound  routines  exploit  the  network  structure  of  the  problem, 
decomposing  it  by  commodities  and  solving  the  resulting  pure  network 
problems.  In  the  limit  both  bounds  must  converge  to  the  optimal 
objective  function  value;  in  practice  we  stop  when  the  difference 
between  the  two  bounds  is  within  some  termination  tolerance. 


Whether  solving  for  lower  bounds  or  for  upper  bounds  a  sub¬ 
gradient  optimization  technique  is  used.  At  each  iteration  this 
procedure  requires  the  computation  of  a  subgradient,  the  selection  of 
a  step  size,  and  a  projection  operation.  In  Section  3.1,  we  obtain  a 
new  convergence  result  for  a  particular  class  of  subgradient  pro¬ 
cedures.  Then,  in  Section  3.2,  we  introduce  a  new  heuristic,  closely 
related  to  the  subgradient  optimization  procedure,  which  has  worked 
well  for  our  test  problems. 

Our  technique  has  been  tested  on  randomly  generated  test 
problems  and  on  one  problem  which  was  formulated  specifically  to 
represent  the  class  of  evacuation  planning  problems  for  which  the  code 
was  developed.  In  addition,  the  same  set  of  test  problems  was  solved 
by  MCNF  [51],  a  general  purpose  multicommodity  network  flow  problem 
solver  which  uses  a  primal  partitioning  scheme.  Computation  times  for 
both  codes  are  presented.  Our  code  used  an  average  of  68%  of  the  time 
needed  by  MCNF,  performing  significantly  better  on  the  problems  with 
fewer  commodities.  In  addition  our  code  required  on  the  order  of  1/K 
the  amount  of  main  memory  for  a  K-commodity  problem,  so  it  can  solve 
significantly  larger  problems  than  MCNF. 


CHAPTER  11 


A  SURVEY  OF  RELATED  LITERATURE 

In  this  chapter  we  present  an  overview  of  the  existing  work  on 
which  this  dissertation  is  based.  Section  2.1  deals  with  the  work  that 
has  been  done  in  the  area  of  pure  network  models.  Then  in  Section  2.2 
we  address  the  broader  area  of  multicommodity  network  methods.  Since 
our  algorithm  involves  a  subgradient  optimization  technique,  both  in 
the  Lagrangian  dual  portion  and  in  the  resource-directive  decomposition 
routine,  we  provide  some  references  involving  subgradient  optimization 
in  Section  2.3 

2.1  Pure  Networks 

Network  problems  are  linear  programming  problems  with  node-arc 
incidence  matrices  as  their  constraint  matrices.  Within  this  class, 
known  formally  as  minimal  cost  network  flow  problems,  there  are  several 
variations  including  transportation  problems,  transshipment  problems, 
assignment  problems,  maximal  flow  problems,  and  shortest  path  problems. 

Ideas  for  solution  of  network  problems  can  be  traced  at  least  as 
far  back  as  1939,  to  the  work  of  Professor  Leonid  Kantorovich  [41], 
Kantorovich,  along  with  Professor  Tjalling  C.  Koopmans  received  the. 
Nobel  Prize  in  Economic  Science  in  1975,  for  contributions  to  the 
theory  of  optimum  allocation  of  scarce  resources.  Koopmans  and  Reiter 
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[34]  and  Frank  L.  Hitchcock  [42],  working  independently,  were  the  First 
to  formulate  the  transportation  problem.  The  mid-fifties  saw  a  surge 
of  interest  and  work  in  the  areas  of  network  algorithms.  It  was  around 
this  time  at  Alex  Orden  [59]  generalized  the  transportation  model  to 
allow  transshipment  points.  Lester  Ford  and  Delbert  Fulkerson  [22] 

[20]  formulated  and  investigated  solution  techniques  for  the  maximal 
flow  problem  and  the  minimal  cost  network  flow  problem.  The  spe¬ 
cialized  algorithms  that  have  been  developed  for  solving  network 
problems  may  be  classified  into  two  groups:  primal-dual  techniques,  and 
specializations  of  the  primal  simplex  algorithm.  Primal-dual  methods 
for  solving  networks  began  with  Harold  Kuhn's  Hungarian  Algorithm  for 
the  assignment  problem  [55]  and  culminated  in  Fulkerson's  Out-of-Kilter 
Algorithm  [23].  Primal  simplex  based  techniques  originated  with  the 
work  of  Professor  George  Dantzig  [17]  and  continued  through  Ellis 
Johnson's  1965  paper  [47],  The  basis  for  Johnson's  work  can  be  traced 
to  the  work  of  Dantzig  [18]  and  Charnes  and  Cooper  [14]. 

Since  that  time  much  work  has  been  done  in  the  area  of  solution 
techniques,  and  computational  advances  have  been  made  by  the  develop¬ 
ment  of  more  efficient  data  structures.  The  credit  for  much  of  this 
work  goes  to  Professors  Fred  Glover  and  Darwin  Klingman  and  their 
colleagues  at  the  University  of  Texas.  This  is  evidenced  by  such 
papers  as  Barr,  Glover  and  Klingman  [9]  [10],  Glover,  Hultz  and 
Klingman  [26]  [25],  Glover,  Karney  and  Klingman  [27],  Glover,  Karney, 
Klingman  and  Napier  [28],  Glover  and  Klingman  [29]  [31]  [30],  Glover, 
Klingman  and  Stutz  [32],  and  Karney  and  Klingman  [49].  Others  who  have 
contributed  to  the  research  include  Srinivasan  and  Thompson  [63]  [64], 


Bradley,  Brown,  and  Graves  [13],  and  Mulvey  [57]  [58].  In  addition 
significant  work  has  been  performed  by  Professors  Jeff  Kennington, 
Richard  Barr,  and  Richard  Helgason  of  Southern  Methodist  University  as 
seen  in  such  works  as  [3],  [41],  and  [52]. 

Today  network  algorithms  have  been  demonstrated  to  solve  linea: 
network  problems  50  times  faster  than  general  linear  programming 
algorithms  [6].  Additionally  a  computer  implementation  of  such  a 
technique  may  require  only  half  the  memory  of  the  general  L.P.  pa:><i;* 
[6].  These  advances  are  due  to  the  efficient  data  structures  whir- 
have  been  developed  to  allow  a  basis  for  a  network  problem  to  be  stored 
as  a  rooted  spanning  tree  on  the  nodes  in  the  network.  Using  this  .oea 
all  the  simplex  computations  such  as  pricing,  ratio  test,  anc  updates, 
can  be  performed  via  labelling  algorithms  on  the  basis  tree.  This 
eliminates  the  need  to  store  the  basis  inverse  in  factored  form. 


2.2  Multicommodity  Networks 

Multicommodity  network  flow  problems  are  problems  in  whic* 
several  different  types  of  items  (commodities'1  must  share  arcs  : ■ 
capacitated  network.  Each  solution  technique  for  multicommodity 
network  models  can  be  classified  as  one  of  two  main  types  or 
algorithms:  partitioning  algorithms  and  decomposition  algorithm: 


2.2.1  Partitionino  Algorithms 


Partitioning  algorithms  are  specializations  of  the  simplex  mptnod 
which  exploit  the  multicommodity  network  structure  by  partitioning  the 
basis  into  more  than  one  part.  In  one  part  advantage  is  taken  of  toe 
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special  network  structure.  Those  who  have  studied  primal  partitioning 
algorithms  include  Kennington  [50],  Helgason  and  Kenninton  [40],  Ali, 
Helgason,  Kennington,  and  Lall  [4],  Hartman  and  Lasdon  [36]  [35], 

Maier  [56],  and  Saigal  [61].  Ali  and  Kennington  [6],  in  their 
computational  research,  reported  solution  times  averaging  5  times 
faster  than  general  linear  programming  codes.  A  dual  partitioning 
method  was  proposed  by  Grigoriadis  and  White  [34].  A  primal-dual 
partitioning  scheme  was  developed  by  Oewell  [46].  In  addition  a 
factorization  technique  was  proposed  by  Graves  and  McBride  [33].  MONF, 
the  multicommodity  network  code  with  which  we  compared  our  solution 
times,  is  a  primal  partitioning  program. 

2.2.2  Decomposition  Algorithms 

Decomposition  schemes  seek  to  solve  the  problem  by  decomposing  it 
into  several  smaller  subproblems,  each  of  which  takes  the  form  of  a 
pure  minimum  cost  network  flow  problem.  A  master  program  coordinates 
the  solution  process.  Decomposition  procedures  for  the  multicommodity 
network  flow  problem  fall  into  two  categories:  price-directive  schemes 
and  resource-directive  schemes. 

Price-directive  decomposition  is  based  on  the  well-known  research 
of  Dantzig  and  Wolfe  [19].  In  a  price-directive  approach,  the  K- 
commodity  problem  is  decomposed  into  K  single  commodity  problems.  The 
master  program  then  uses  the  simplex  method  while  the  subproblems  test 
for  optimality  and  select  candidates  to  enter  the  basis  of  the  master 
problem.  Ford  and  Fulkerson  [21]  were  the  first  to  develop  this 


approach  for  solving  multicommodity  network  flow  problems.  Tomlin  [67] 
was  the  first  to  develop  a  computer  code  implementing  this  technique. 
Others  who  have  studied  price-directive  decomposition  schemes  are 
Jarvis  [43],  Jarvis  and  Keith  [44],  Chen  and  DeWald  [15],  and  Jarvis 
and  Martinez  [45].  Price-directive  approaches  for  generalizations  of 
this  problem  have  been  proposed  by  Cremeans,  Smith,  and  Tyndall  [16], 
Swoveland  [65]  [66],  Weigel  and  Cremeans  [68],  and  Wollmer  [69]. 

Resource-directive  decomposition  schemes  decompose  the  problem  by 
commodities,  and  the  master  problem  systematically  distributes  the 
mutual  arc  capacities  among  the  commodities.  At  each  iteration  the 
optimal  solutions  to  the  single  commodity  subproblems  are  used  to 
compute  a  new  set  of  allocations.  Robacker  [60]  was  the  first  to 
suggest  this  approach  for  multicommodity  network  problems.  Research  on 
this  technique  has  been  presented  by  Swoveland  [65],  Assad  [8],  Ali, 
Helgason,  Kennington  and  Lall  [3],  and  Kennington  and  Shaiaby  [53]. 

2.3  Subgradient  Optimization 

The  subgradient  optimization  technique  was  first  proposed  by  Shor 
[62]  in  1964.  Since  that  time  subgradient  algorithms  have  been  applied 
to  many  different  optimization  problems.  Held  and  Karp  [37]  and  Held, 
Wolfe  and  Crowder  [38]  made  use  of  the  approach  in  solving  the 
symmetric  travelling  salesman  problem.  Bazaraa  and  Goode  [11]  applied 
the  algorithm  to  the  asymmetric  travelling  salesman  problem.  Sub- 
gradient  methods  have  been  used  to  solve  the  assignment  problem  [38]. 


Glover,  Glover  and  Martinson  [24]  applied  a  subgradient  technique  to 


solve  a  special  network  with  side  constraints,  and  Ali  and  Kennington 
[7]  made  use  of  it  in  research  involving  the  m-travelling  salesman 
problem. 


CHAPTER  III 


THE  ALGORITHM 

Here  we  present  a  new  solution  technique  for  the  multicommodity 
network  flow  problem.  This  technique  involves  finding  successively 
better  upper  and  lower  bounds  on  the  optimal  objective  function  value. 
The  algorithm  terminates  whenever  the  two  bounds  are  within  a  prescribed 
tolerance  or  when  it  can  be  shown  that  the  current  solution  is  an  exact 
optimum. 

Lower  bounds  are  generated  by  partially  solving  a  Lagrangian  dual. 
At  each  iteration  a  Lagrangian  relaxation  of  the  original  problem  is 
solved;  since  these  relaxations  decompose  on  commodities,  only  a 
(single-commodity)  minimum  cost  network  flow  optimizer  is  needed.  A 
subgradient  direction  is  used  to  adjust  the  Lagrange  multipliers  for  the 
next  iteration. 

Upper  bounds  are  generated  using  a  modification  of  the  resource- 
directive  decomposition  technique  first  suggested  by  Robacker  [60].  We 
introduce  a  specialization  of  the  subgradient  direction  approach  which 
was  first  applied  to  this  class  of  problems  by  Held,  Wolfe,  and  Crowder 
[38]. 

With  minor  restrictions  on  the  step  sizes  we  show  that  both  the 
upper  and  lower  bounds  converge  to  the  optimal  objective  value  of  the 
original  multicommodity  network  flow  problem.  Hence  in  the  limit  the 
algorithm  will  converge  to  an  exact  optimum.  In  practice  we  seek  a 


near-optimum. 


3.1  Subgradient  Optimization 

Let  us  first  consider  the  general  subgradient  algorithm  for 
optimization  of  convex  functions;  later  we  will  present  specializations 
of  the  technique  for  the  upper  and  lower  bound  problems.  Consider  the 
nonlinear  programming  problem 
Minimize  g(y) 

Subject  to  y  c  T 

where  g  is  a  real  valued  function  that  is  convex  over  the  compact, 
convex,  nonempty  set  r.  A  vector  n  is  called  a  subgradient  of  g  at  a 
point  x  if 

g'y)  -  g(x)  2.  h(y  -  x)  for  all  yc  r  . 

Note  that  if  g  is  differentiable  at  x,  the  only  subgradient  at  x  is  the 
gradient.  We  denote  the  set  of  all  subgradients  of  g  at  x  by  cq(x). 

The  subqradient  algorithm  proceeds  in  this  manner:  Given  a  point 
x  in  r ,  find  a  subgradient  of  g  at  x,  obtain  a  new  point  by  moving  a 
given  step  size  in  the  negative  subgradient  direction,  and  finally 
project  this  new  point  back  onto  T.  This  projection  operation  takes  a 
point  x  and  finds  the  point  in  T  that  is  "closest"  to  x  with  respect  to 
the  Euclidean  norm.  We  denote  the  projection  of  x  onto  r  by  P[x]. 

Using  this  notation  we  present  the  general  subgradient  optimization 
algorithm  for  minimizing  a  convex  function  g  [32]. 


ALGORITHM  3.1  SUBGRADIENT  OPTIMIZATION  ALGORITHM 
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9 
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i 

Step  0  'Initialization) 

Let  yg  be  any  element  of  T.  Select  a  set  of  step  sizes, 
s^  ,s„  ,s,  , . . . ,  and  set  i-H3.  I 

Step  1  (find  Subgradient) 

Let  r,  ^  c  ?g(y).  If  r, .  =  0  terminate  with  y^  optimal. 

Step  2  (Move  to  New  Point)  I 

Set  v.  P[v.  -s.r .].  Set  i  -  i  +  1.  Return  to  step  1. 


Let  us  now  turn  our  attention  to  the  selection  of  step  sizes.  I 

Several  ideas  for  choosing  step  sizes  have  been  proposed.  These 
typically  involve  a  seqjence  of  constants,  {>.  ^  , . . .)  which  satisfy 

the  following  conditions:  { 

>  .  >  0,  for  all  i, 

l  ’ 

1 im  ) .  =  0 ,  and  (3.1) 

l 

Z  >  -  «  , 

l 
l 


The  subgradient  algorithm  can  be  shown  to  converge  when  any  of  the 
following  three  formulae  are  used  for  determining  step  sizes  [52]: 


(lO  s.  =  >1/|  |r,.  I  |2  ,  (3.2) 

(m)  si  =  > -i[g'yi)  -  g*]/|  |r, .  |  |2 


where  g*  denotes  the  optimal  objective  value. 
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Propositions  3.1,  3.2,  and  3.3  may  be  found  in  Kenmngton  and 
Helgason  [32],  and  are  given  here  as  necessary  preliminary  results. 
Proposition  3.1  [32] 


Let  yeT,  and  let  xcR  .  Then  ( x-P[x] )( y-P[x] )  £  0. 


Proof 

Choose  a  so  that  0<a<1 .  Since  r  is  convex,  ay+(1-a)P[x] cf.  By 
the  definition  of  P[x],  |  |x-P[x] | |<J |x-(ay+(1-a)P[x]) | | .  Thus 
|  |x-P[x]||2  <  | | x— ( ay+(1-a)P[ x] ) j |2 
=  |  | x-P[ x ]- a( y-P[ x] ) I |2 

=  |  |x-P[x]||2+a2|!y-P[x]||2  -2  a( x-P[ x] ) ( y-P[ x] ) . 


Then  ( x-P[ x] )  ( y-P [ x] )  _<  | | y-P[ x] | | a/2 .  And,  since  a  can  be  taken 
arbitrarily  close  to  0, 

( x-P[ x] ) ( y-P[ x] )  <  0  .  « 

Proposition  3.2  [32] 

Let  x,  y  c  Rn.  Then  | |P[x]-P[y]  j  |  <  ||x-y||. 


Proof 


Case  1:  Supose  P[x]  =  P[ y] .  Then 
|  | P [ x]-P [ y ] | |  =  0  <  | | x-y | | . 

Case  2:  Suppose  P[x]  i  P[y],  Then  since  P[x]eT, 
and  P[y]eT,  from  Proposition  3.1  we  have  that 
(x-P[x])(P[y ]-P[x])  <  0 


(y-P[y])(P[x]-P[y])  <  0. 

We  may  rewrite  the  above  inequalities  as 
x(P[y]-P[x])-P[x]P[y]+|  |P[x]|  |2  0 


I 


yCP[xj-P[>])-P[\]P[x]*| |P[y] | |  <  0. 


Adding  these  inequalities,  we  obtain 


( x-y ) {P[ y]-P[ x J )+ 1 |P[ y]-P[ x]  1 1“  <  0. 


Then  from  the  Cauchy- Schwartz  inequality, 


- ( x-y ) (P[ y ] — P [ x ] )  <  j | x-y  |  |  |  |P[y]-P[ x]  |  | 


!|P[>0-P[x]||2  <  j | x- y  j  j  | |P[v]-P[x]  |  |  . 


And  since  P[x]  i  P[y], 


I |P[x]-P[>] ! I  <  | | *-y | ] . 


Proposition  3.3  [32] 


If  r, ^  i  0,  then,  for  any  ycf, 


llyw-y|l2<  Ihj-yll2  * 


Proof 


Let  i  be  any  iteration  of  the  subgradient  algorithm.  Suppose 


i  0.  Let  yi~ .  Then,  by  Proposition  3.2, 


||PtyrVi>-^>:!i2  i  ll>rhvyll2 


Vj-yll2  •*  .12||,1||J  *  2*i’-.i<y-y1)- 


Since  P[v]=v  and  P[v.-s  ”  =  y.  we  have  that 

'  'i  ii  7  1+1  ’ 


!!yi+1-y||2 1  llyy-yll2  si2||rli||2  +  2sini(y-yi)  . 


Our  main  convergence  result  is  for  the  particular  step  size 


scheme: 


=  >.^9^)  -  gl/lhjl 


v.  .........  -V  .V.  V.V.V.V.V  V 


s . 
1 


where  g  is  a  lower  bound  for  the  optimal  objective  and  where  we  are  at 
liberty  to  select  bounds  a  and  6  for  the  {>.^  such  that  for  each  i,  0  < 

0  £  xi  £  £  <  2* 

Proposition  3.4 

Let  (i)  g  be  a  known  lower  bound  for  the  optimal  objective,  g* , 
with  g*>g; 

(ii)  be  any  infinite  sequence  such  that 

for  all  i,  0<aO^_<B<2;  and 
(  iii)  si  =  A1[g(yi)-g]/|  |r,..  |  |2. 

If  there  is  a  constant  C  such  that  for  all  i,  ||n^||  £  C,  and  if  v  >  0 

is  given,  then  there  is  some  n  such  that  g( yp )  _<  g*+[B/(2-£)]!g*-g)+ 

3 

Proof 

Let  \>0  be  given.  Let  (i),  (ii),  and  (iii)  hold.  Let  y*  be  an 
optimal  point,  and  for  all  i,  ||iv||  £C.  Suppose,  contrary  to  the 
desired  result,  that  for  all  n,  g( yn)>g*+[ £/( 2—6) ] (g*-g)+y.  Then?  by 
Proposition  3.3, 

I  lyi+1-y*l I2 1  I l2+Ai2tg(y.)-g]2/| |rii|  I2 
+  2A±{[g( yA  )-g]/|  |r(i  1 1 2  }rljL  ( y*-y± ) 

£  llyi-y*ll2+Ai2[g(yi)-g32/||nill2 
♦  2xi{[g(yi)-g]/|  |rti  1 1 2 }[  g*-g( y±)  3 , 

since  n^agCy^. 

2 

Since  £>>.  .>0,then  £  A  .  >  ).  .  So, 

—  i  i-i 


llVi-y'll2  <  ||yi-y.||2*£>i[g:yi)-g]2/||r,i||2 


2 

+  2>>i  \[g(  y± )— g]/[  InJ 12  }[g*-g(yi)] 

=  1 1 y±-y* I  |2+(2-B)xi{[g(y.)-g]/|  InJ  |2} 
[(g*-g(yi))+(s/(2-g))(g*-g)]. 

Since  gty.^)  >  g*+( 6/(2- 6) ) { g*-g)+Y,  then  -Y  >  g*-g(yi)  + 
(&/(2-e))(g*-g).  So, 

llyi+1-y*!l2  <  ! ly^y*! |2-(2-e)Ai[g(yi)-g3Y/| UJ !2- 

Since  g*£g(  y^ ) ,  a<),.  ,  and  <_  C,  then 

I lyi+l"y*l i2  <  I  I yj“y* I |2-[(2-£)a( g*-g) y]/C2 .  (3.3) 

We  can  choose  an  integer  N  so  large  that 

C2 I  I > t - y* I |2/(2-£)a(g*-g)y  <  N. 

Thus,  since  2-£>0  and  g*-g>0, 

N(2-£ )a( g*-g)y/C2  >  ||yi-y*||2. 

Adding  together  the  inequalities  obtained  from  (3.3)  by  letting  i  take 
on  all  values  from  1  to  N,  we  obtain 

l|yN+1-y*llZ  <  ||yry*||2-N(2-6)a(g*-g)Y/c2  <  0, 


a  contradiction.  ■ 


It  is  shown  in  [39]  that  when  r  is  compact,  g  is  continuous  on  some  open 
set  containing  T ,  and  2g(y)  i  c,  for  all  ycT,  there  exists  a 
constant  C  such  that  ||r,|j_<C  for  all  ycr,  and  nc3g(y)»  so  that  the 
boundedness  condition  on  the  subgradients  in  Proposition  3.4  is  easily 
met. 


3.2  Generating  Lower  Bounds 

In  this  section  we  present  a  technique  for  generating  lower  bounds 
for  the  multicommodity  network  flow  problem.  This  technique  involves 
partially  solving  the  Lagrangian  dual  problem  using  a  subgradient 
technique  to  update  the  Lagrange  multipliers  at  each  iteration. 

Recall  that  the  multicommodity  network  flow  problem,  MP ,  may  be 
stated  as  follows: 

k  k 

Minimize  Z  c  x 
k 

Subject  to  Axk  r  rk,  k  =  1,...,K  (MP) 

,  k 

Z  x  <  u 

k  - 

0  <  xk  <  v\  k  =  1,...  ,K 


where 

A  is  an  m  x  n  node-arc  incidence  matrix, 
c  is  an  n  vector  of  unit  costs  for  k  =  1,...,K, 
r  is  an  m  vector  of  node  requirements  for  k  =  1,...,K, 
u  is  an  n  vector  of  mutual  arc  capacities, 

v  is  an  n  vector  of  individual  commodity  bounds  for  k=1,...,K, 


x  is  an  n  vector  of  decision  variables  for  k  = 


and  K  is  the  number  of  commodities. 


Consider  a  Lagrangian  dual  problem  for  MP,  denoted  by  DP: 

MAX  h(x) 

A>0 

h(X)  =  MIN[  Z  ckxk  +  X(  Z  xk  -  u):  (DP) 

k  k 

Axk  =  rk  (k  =  1 ,K ) ;  0  <  xk  <  vk  (k  = 

where  X  is  an  n  vector  of  Lagrange  multipliers. 

first  we  show  that  any  feasible  solution  for  DP  is  a  lower  bound 
for  MP. 

Proposition  3.5  [12] 

_1  _?  -K 

Let  x  =  (x  ,x  ,...,x  )  be  a  feasible  solution  for  MP.  Let 
X  be  a  feasible  solution  for  DP.  Then  h(T)  <_  cx. 

Proof 

Since  h(X)  is  a  minimum,  and  since  x  is  feasible  for  MP,  hCx)< 

k-k  —  -k  — 

Zc  x  +  X( Zx  -  u) .  Further  since  x  is  feasible 

k  k  _  _k 

for  DP  and  x  is  feasible  for  MP,  then  x(Zx  -  u)  <_  0. 

k 

Hence  h(>. )  <_  cx.  ■ 

In  addition  to  this  result,  Bazaraa  and  Shetty  [12]  have  proved 
that  if  MP  has  an  optimal  solution,  then  DP  has  an  optimal  solution,  and 
that  their  optimal  objective  function  values  are  equal.  As  a  result,  we 
see  that  we  may  indeed  solve  (or  partially  solve)  DP  in  order  to  obtain 
a  lower  bound  for  MP. 

In  order  to  justify  using  a  subgradient  optimization  technique  for 
solving  DP,  we  must  show  that  the  objective  function  is  concave  and 
develop  an  expression  for  a  subgradient. 


Proposition  3.6 

The  real  valued  function  h  is  concave  over  A  =  {X:X  c  R n ;  x  ^  0  }. 

Proof 

Let  x1  >_  0.  Let  X2  >_0.  Let  0  <  a  <  1 .  Then 

hUx1  +  (1-a)X2)  =  HIN[Z  ckxk  +  (  aX1  +  ( 1  -  a)  X2  )  (  Ixk-u) : 

k  k 

Axk  =  rk(  k=1  , . . .  ,K)  ;0  <  xk  <_  vk  ( k=1  _ _ ,K  )  ] 

=  MIN[aI ckxk  +  aX^(Ixk-u)+(1-a)lckxk  +  (1- a)  X2(Ixk-u) : 
k  k  k  k 

Axk  =  rk(k=1  0  <_  xk  <  vk(k--1 . K)] 

>_aM!N[lckxk  -*■  X*'(Zxk-u): 
k  k 

Axk  =  rk(k=1  .  ,K);0  ^xk  <_  vk(  k=1  , . . .  ,K )  ] 

+  ( 1  -  a)  MIN  [:ckxk  +  X2(Ixk-u)  : 

k  k 

Axk  =  rk(kr1  ,...,K):0  <_xk<vk  (k=1,...,K)] 

1  2 

r  ah' X  )  -»■  (l-a)h(X  )•  Hence  h  is  concave  over  A.  ■ 

Proposition  3.7 

Let  X _>  0.  Let  x  represent  an  optimal  value  of  x  corresponding 
to  h( x) .  Then  d  =  ^x  -u  is  a  subgradient  of  h  at  X. 

Proof 

Let  X  be  any  other  point  in  A  with  corresponding  optimal  decision 
variable  values  x.  Then 


* 


<r 


a 


I 


<_  Z  c  x  +  X(Z  x  -u)  (since  x  is  optimal) 
k  k 

r  k-k  .  .  —  k  \  , »  —  k  t" r  *k\  /  \ 

=  L  C  X  -*■  A(  i.  x  -u)  (  A  L.  X  -  At  X  )  +  (AU-AU) 

k  k  k  k 

=  Z  C^x^  +  \{l  x^-u)  -*•  (Z  x^-u)  (  X-3.) 
k  k  k 

-  h(  x)  +  d(  a-  a)  • 

Therefore  d  is  a  subgradient  of  h  at  X.  ■ 

We  now  present  our  algorithm  for  computing  lower  bounds  for  MP 
Note  that  it  is  a  specialization  of  the  subgradient  optimization 
algorithm  for  this  problem,  and  its  convergence  follows  as  a  maximiz 
tion  analog  of  Proposition  3.4. 


ALGORITHM  3.2  LOWER  BOUND  ALGORITHM 


Step  0  (Initialization'1 

Let  UB  be  any  upper  bound  on  the  solution  to  M15 .  Set  i  -  0; 

1  K 

0;  a q  -  2.  Compute  yQ  -  h'/.^)  and  let  xQ  r  (xD,...,xQ;  be  the 

corresponding  optimal  values  nf  the  decision  variables. 

Step  1  >Pjnd  SuboradienO 

Set  r, .  -  Z  x  -u.  If  r,.  =  0,  stop  with  y.  optimal, 
k 

Step  2  (Move  to  New  Ppintl 

Set  s  -  a^(UB-y^)/| | r^| |  .  Compute  the  j  component  of 


i-1 


U,  J, 


MAX 


t'/ii+s,  r,. 


0} 


ss: 


iT*  V 


k  *  L 
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Compute  yi+1  —  h(  )  and  let  be  the  corresponding  optimal  values 

of  the  decision  variables.  Set  q.  „  *■  a/2.  Set  i  -  i*1  .  Return  to 

l-*- 1  i 

step  1 . 

3.3  Generating  Upper  Bounds 

Here  we  describe  a  procedure  for  generating  upper  bounds  for  the 
multicommodity  network  flow  problem.  This  procedure  is  a  specialization 
of  the  resource-directive  decomposition  (RDD)  algorithm  using  a  sub¬ 
gradient  direction.  First  we  describe  the  general  RDD  procedure;  then 
we  present  our  specialization. 

The  RDD  technique  produces  a  sequence  of  feasible  solutions  b> 
distributing  the  mutual  arc  capacity  among  commodities  in  such  a  way 
that  the  solutions  to  the  K  individual  subproblems  provide  a  solution  to 
the  composite  problem.  At  each  iteration  an  allocation  is  made  and  the 
resulting  K  (single  commodity)  minimum  cost  network  flow  problems  are 
solved.  If  the  solution  meets  an  optimality  criterion  then  the 
procedure  terminates;  otherwise,  a  new  allocation  is  made,  and  the 
process  is  repeated. 

After  introducing  artificial  variables,  (a  ),  Mp  becomes: 

Minimize  Z  c^x^  +  H  I  J_ak 
k  k  ~ 

Subject  to  Axk  ■+•  a^  =  rk  (k  =  1,...,K) 

r  k 

I  X  <  u 


.  -v  ■>-  «.  *  *  r  *  ~  m  u  m  m  m  ai  w  u*  *  *  m  w  m  « 
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where  M  is  a  very  large  positive  number  and  1  is  an  m  vector  of  all 
ones . 

Let  us  restate  the  problem  as: 

Minimize  zty1 .... ,yK) 

Subject  to  z(y1,...,yK)  =  Z  zk(yk)  (RP) 

k 

_  k 

-y  =  u 

k 

o  i  yk  1  yk  (k  r  1 , . . . ,K) 

L/  L/  L-  1/  L/  1/1/1/  L-  L/  1/ 

where  z  (y  )  =  MIN  (c  x  +M1_a  :  Ax  +  a  =r  ;  0<  xK<  y  ;  a  >0}  for  k  = 

^  k 

We  shall  refer  to  this  formulation  as  RP.  Note  that  z  (v  )  = 

rkkkkk  kkk  k 

MAX-.r  •-  -y  v  :u  A-.  <_c  ;  <_M1_ : .  >0},  by  duality  theory. 

In  order  to  justify  using  a  subgradient  optimization  technique  we 
1  K 

must  show  that  z'y  ,...,y')  is  a  convex  function  and  develop  an 
expression  for  a  subgradient. 

Proposition  3.8  [32] 

The  real  valued  function  z  is  convex  over 

v  -  1  k  1  n  K  n 

N  -  •.  y  , .  .  .  ,  y  :  y  >  0 :  . . . :  y  >0 ; . 

P  roof 

Let  (y1  , . .  .  ,yk)cV  and  (y\ —  ,yK)cY.  Select  a  so  that 


0<o< 1 •  Then 

zUyVd-aly1 ,. . .  ,ayK  ♦  (1-a)yK] 

kr  “"k  ,  _  \  *  k 

=  :  Z  Lay  + . 1 -a ) y  J 

k 


=  Z  MAX  r^u^-  [ay^  +  (1-a)yk].k; 
k 


k 


k,  k  k 

_  A- .  - c  ; 


.  w*  ■ 


k 


0 


v*  y»  «r* 


Ull(  ,  r  k  k  -k  k-|  ,,  k  k  »k  k, 

MAXiair  p  -y  v  ]  +  (1-a)[r  p  -y  v  ]; 


p^A-v^cc^;  p  ^<M1 ;  v  0  } 


„  .,.x,  r  k  k  -k  k 

<  al  MAX  { r  p  -y  v  : 


k.  k  k  k  . ..  knl 
P  A-v  <c  ;  p  <M1 ;  v  >0  } 


ie  L  '>P  P 

( 1  — a )  I  MAX{  r  p  -y  v  : 


k.  k  k  k  ...  k 
p  A-v  <c  ;  p  £MJ_;  v  >0; 

,-i  -Kx  x  r*\  “Kx 

az(y  ,...,y  )  +  (l-a)z(y  ,...,y  ), 


Therefore  z  is  convex  over  Y.  • 


Proposition  3.9  [  52  ] 

1  -K  -k  -k 

Let  y  =  (y  ,  ...,y')cY  be  any  allocation  and  let  ( p  ,v  ) 

k  «k 

denote  the  corresponding  optimal  solution  to  z  (y  )  for  k  r 

- 1  -  K 

Then  r.  r  (-v  is  a  subgradient  of  z  at  y. 


Proof 


IK  k  k 

Let  y  =  (y  ,  ...,y')cY  be  any  allocation  and  let  (p  ,v  )  denote  the 

k  k 

corresponding  optimal  solution  to  z  (y  )  for  k  =  1,...,K.  Then: 

,  1  Kv  1  -Kx  r  /  k  k  k  k>  _  ,  k-k  -k-k, 

z(y  ,...,y  )-z(y  ,...,y  )  =  <•  (r  u  -y  v  )  -  I  (r  p  -y  v  ) 

k  k 

,  ,  k-k  k-k,  _  ,  k-k  -k-k , 

>_  *.  (r  p  -y  v  )  -  E  Cr  p  -y  v  ) 

k  k 

-  ,  -kw  k  -kx 

=  -  (-v  )(y  -y  )  • 


Hence 


is  a  subgradient  of  z  at  y.  ■ 


Recall  that  the  subgradient  optimization  algorithm  requires 

technique  for  projecting  a  point  onto  the  feasible  region.  We  now 

explore  the  projection  operation  for  this  problem. 

Let  us  denote  the  feasible  region  for  RP  by  ft  .  That  is, 

ft  =  {(y  \  . . .  ,y^  ) :  £yk  =  u;0  .iy^  Jjvk(k  =1,. Given  an 

-1  -K 

arbitrary  allocation,  (y  ,...,y  ),  to  project  it  onto  ft,  we  solve 


min; 


K,  ,-1  -K .  ,  | 

, . . .  ,y  My  ,  •  • .  ,y  )  |  :  y  eftj 


M,v  k  -kMI/2 

=  MIN  ; ;  My  .-y  .  /  )  :  y  eft;. 

kj  J  * 


Or,  equivalently,  we  can  solve: 


k  -kv2 
kj  J  J 


y  c.i> 


Note  that  this  problem  decomposes  on  j.  Hence,  for  each  arc  j,  we 
solve: 


min;: 

k 


k  -k,2 
\  -\  .  : 
'  J  *  J 


0 


k  k 
<  y  .  <  v  . 

-  J  -  J 


(krl . «)  }. 


We  will  denote  the  above  projection  problem  by  P.  The  following 
algorithm  [52]  is  used  to  solve  P  for  any  arc,  j. 


ALGORITHM  3.3  PROJECTION  ALGORITHM 


Step  0  ( Initialization) 


If  u.  >  ft  v.  or  u.  <  0,  terminate  with  no  feasible 
J 


J  k  J 


solution.  Otherwise  set  1  1  ;  r-^2K;  L-*-ft  v  R-K3.  Compute 
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•  k  —  k  k 

the  breakpoints,  b.  ( i=1 , . . . ,2K ) ,  as  y.  and  y.-v.  (k=1,...,K). 

^  J  J  J 

Order  the  breakpoints  so  that  <_  b9  £  ...^b^^. 

Step  1  (Test  for  Bracketing) 

If  r-1  =1  go  to  step  4;  otherwise,  set  m«-[  (1+r )/2 ]j  where  [K]  ^ 


is  the  greatest  integer  £  K. 

Step  2  (Computer  New  Value) 

Set  C  ^  i  MAX-MINty^-y  ,  v^],  0} 
k  J  J 

Step  3  (Update'1 

If  C=c  then  set  X  -y*  and  go  to  step  3>.  If  C>c  then 

m  J 

set  1— m:  |_-C;  and  go  to  step  1.  If  C<c  then  set  r^m;  R^-C:  and  go  to 
step  1 . 

Step  4  ' Interpolate^ 

Set  >.*-b.+[  ( b  -b.  )  ( c-L  )  ]/( R-L  ) . 
i  r  i 

Step  3 

Compute  the  feasible  (projected)  allocation,  y  ^ ,  for 
k=1  , . . .  ,K  in  this  way: 

k  r  -k  k 

v  .  ,  if  *  <  y  -v. 

J  -  J  J 

-k  ,  _  .  r  -k  k  -k 

,  if  yj-vx^Vj 

0  ,  if  X*  >  yj  . 

1  K 

Terminate  with  the  feasible  allocation  for  arc  j,  (y.,...,y.). 

J 

An  upper  bound  algorithm  using  the  subgradient  procedure  is  now 


0 


<0 


- 0 


0 


0 


presented.  Its  convergence  is  a  direct  result  of  Proposition  3.4. 


algorithm  3. a  ipper  bound 


ALGORITHM 


© 


© 


Step  0  '  Imt  lal  izat  ionx 

Let  LB  be  any  lower  bound  on  the  solution  to  MP.  Choose  a  set  of 
initial  allocations,  yQ  =  ( >  ^ . , Vq )  by  setting  y^  *  P[(1/K)(u)] 
for  k  =  1  , .  •  .  ,K .  Set  >.g  -  2 ;  i  -  0;  UB~<*. 

Step  1  (Rind  Subgradient) 

Let  solve  z^Iy^)  for  k  =  Let 

1  L/,  k  L- 

r,^~  (- vn  , .  . .  , vO  .  Set  UB*z  (yi)  .  If  r,  •  =  0,  then  terminate  with 
z{ y.)  optimal. 

Step  2  (Move  to  New  Point T 

Compute  si  -  >  ^z'v  .  )-LB]/|  \r  J  |2  .  Set  y.+1  ~ 

P[  y  i  -  s  i i  ] .  Set  i-i+1  .  Return  to  step  1. 

We  now  introduce  a  heuristic  modification  of  the  upper  bound 

algorithm,  which  has  produced  better  results  on  our  test  problems. 

IK  IK 

Recall  tnat  -  =  . .  ')  is  a  subgradient  of  z  at  (y  }. 

Then  for  each  arc  j,  the  vector 

r  '  j:  =  •  *  .e  . ,  -  .  e  ,  . .  .  e  . ) 

J  J  rn-j  j’  ( k-1 ) n+j  j 

serves  to  isolate  the  components  of  g  associated  with  the  commodities 
flowing  on  arc  j.  for  each  such  arc  j  we  compute  an  individual  step 
size  at  iteration  i  as 

si(j)  =  /.i[z(y1.,...,y^)-z*]/||rl.(j)|  j2 


where  z*  is  approximated  b>  LB. 


Using  this  idea  we  now  present  our  heuristic  upper  bound 


algorithm. 


ALGORITHM  3.5  HEURISTIC  UPPER  BOUND  ALGORITHM 

Step  0  (Initialization) 

Let  LB  be  any  lower  bound  on  the  solution  to  MP.  Choose  a  se 

initial  allocations,  =  ( y^  ,...,y^)  by  setting  yk  ■*-  P[  ( 1  /K  )  (  u)  ] 

fGr  k  =  Set  ;  i*0;  UB~~. 

Step  1  (Find  Subgradient) 

Let  (i.k,vk)  solve  zk'yk)  for  k  =  Let  r, ^  = 

(-v1 , . . .  ,-vK) .  Set  UB«-Izk(yk) .  If  n^O,  then  terminate  with 
ii  k 

z(y^)  optimal. 

Step  2  (Move  to  New  Point) 

Compute  si(j)->.i[z(v1i,...  ,>k)-LB]/|  |ni(j)  |  |2  for  each  arc  j. 
Set  S  -  diag' s^(1 ) , . . . ,s^(n) ) .  Set 


Set  (y 


1 

i+1  ’ 


W>[(y] 


» yk )-5r,^ ] .  Set  i-^i+1  • 


t  of 


Go  to  step  1 . 


3.4  The  Alaorithm 
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In  this  section  we  present  the  composite  algorithm  for  solving  HP. 
This  procedure  involves  partially  solving  DP  for  successively  better 
lower  bounds  and  partially  solving  RP  for  successively  better  upper 
bounds  on  the  optimal  objective  function  value.  The  algorithm 
terminates  whenever  (a)  the  solution  to  DP  can  be  shown  to  be  an  exact 
optimum;  (b)  the  solution  to  RP  can  be  shown  to  be  an  exact  optimum;  or 
(c)  the  greatest  lower  bound  and  the  least  upper  bound  generated  are 
within  a  prescribed  tolerance,  t  .  In  case  (c),  the  best  solution  to  RP 
is  presented  as  a  guaranteed  c-optimal  solution. 

ALGORITHM  3.6  COMPLETE  ALGORITHM 

Step  0  (Initialization" 

Let  extermination  tolerance  (D<c<1);  NDLB*-number  of  lower  bound 
iterations  to  perform  on  each  pass;  NOUB-number  of  upper  bound 
iterations  to  perform  on  each  pass;  LB*-®;  UB-*-1*. 

Step  1  'Lower  Bound' 

Perform  NO^B  iterations  or  the  lower  bound  algorithm  (Algorithm 
3.2).  Let  LB  denote  the  best  lower  bound  attained  so  far.  If  Algorithm: 
3.2  terminates  in  step  1  with  an  exact  optimum,  terminate  with  that 
solution  optimal  for  MP. 

Step  2  (Upper  Bound) 

Perform  NOUB  iterations  of  an  upper  bound  algorithm  (Algorithm  3.4 
or  3.5).  Let  UB  denote  the  best  upper  bound  attained  so  far.  If 
Algorithm  3.4  terminates  in  step  1  with  an  exact  optimum,  terminate  with 
that  solution  optimal  for  MP. 


Step  3  (Check  for  Terminat ion) 


If  c(UB)<LB  then  terminate  with  UB  a  guaranteed  e-optimum; 
otherwise,  go  to  step  1. 

In  this  algorithm  the  best  solutions  for  the  lower  bound  and  upper 
bound  problems  at  each  pass  are  retained  and  used  as  starting  solutions 
for  the  respective  problems  on  the  next  pass.  The  details  of  our 
implementation  are  presented  in  Chapter  U. 


CHAPTER  IV 


COMPUTATIONAL  EXPERIMENTATION 


This  chapter  provides  descriptions  of  our  computer  implementation 
of  Algorithm  3.6  and  of  the  test  problems  used.  Our  code,  EVAC,  uses 
MODFLO  [ 1 3  to  solve  the  single  commodity  minimum  cost  network  flow 
subproblems  which  arise  in  Algorithm  3.2  and  in  Algorithm  3.5.  MODFLO 
is  a  set  of  routines  which  may  be  used  to  solve  a  network  flow  problem  or 
to  reoptimize  a  previously  solved  problem  after  changes  are  made  in  some 
of  the  data.  MODFLO,  which  is  based  on  NETFLO  [52],  allows  the  user  to 
change  bounds,  costs,  and/or  requirements  and  then  reoptimize  from  a 
basis  which  was  optimal  for  the  original  problem. 

We  tested  EVAC  on  22  randomly  generated  multicommodity  network 
flow  problems  and  on  one  test  problem  which  was  specially  structured  to 
be  solved  by  EVAC.  The  test  problems  ranged  in  size  from  22  to  754 
nodes  and  from  53  to  1,102  arcs  with  from  0  to  599  linking  constraints 
and  from  3  to  20  commodities.  The  equivalent  LP  sizes  are  between  232 
and  8,904  rows  and  between  470  and  12,111  columns.  The  22  randomly 
generated  problems  were  created  using  MNETGN  [5],  a  multicommodity 
network  problem  generator.  The  problems  were  solved  by  EVAC  and  by  MCNF 
[51],  a  multicommodity  network  flow  code  which  uses  a  primal  parti¬ 
tioning  algorithm.  Solution  times  are  compared  and  conclusions  are 
drawn  concerning  the  relative  effectiveness  of  the  techniques. 


In  this  section  we  present  a  description  of  MCNF  and  EVAC,  the  two 
computer  codes  used  in  our  experimentation.  Both  programs  are  written 
in  standard  FORTRAN  and  have  been  tailored  to  neither  our  equipment  nor 
our  FORTRAN  compiler. 

4.1.1  MCNF 

MCNF  was  developed  by  Jeff  Kennington  at  Southern  Methodist 
University,  Dallas,  TX.  It  is  an  incore  multicommodity  network  flow 
problem  solver  which  uses  the  modification  of  the  revised  simplex  method 
known  as  the  primal  partitioning  algorithm  [36].  In  this  algorithm  the 
basis  inverse  is  maintained  as  a  set  of  rooted  spanning  trees  (one  for 
each  commodity)  and  a  working  basis  inverse  is  maintained  in  product 
form.  The  working  basis  inverse  has  dimension  equal  to  the  number  of 
binding  linking  constraints  corresponding  to  the  current  basis.  The 
initial  basis  is  created  using  a  multicommodity  variation  of  the  routine 
used  in  NETFIO.  A  partial  pricing  scheme  is  used;  the  pricing  tolerance 
is  1.E-6  and  the  pivot  tolerance  is  1.E-8. 

4.1.2  EVAC 

EVAC  is  our  implementation  of  Algorithm  3.6  for  solving  the 
multicommodity  network  flow  problem.  Note  that  Algorithm  3.6  alternates 
between  generating  lower  bounds  using  Algorithm  3.2  and  generating  upper 
bounds  using  Algorithm  3.5.  Since  both  the  lower  bound  problem  (DP)  and 
the  upper  bound  problem  (RP)  decompose  on  commodities,  EVAC  maintains 
only  the  information  concerning  the  current  commodity  in  main  memory. 

The  problem  data  and  most  recent  bases  for  all  the  other  commodities  are 


kept  on  peripheral  storage.  At  the  user's  option  EVAC  stores  in  main 

1  k 

memory  as  much  of  the  current  set  of  allocations,  (y^,...,y^)  and 

1  k 

current  dual  variables  (-v. , . . . ,-v. )  as  desired.  All  our  test 

1  l 

problems  (with  the  exception  of  Problem  23)  were  solved  with  all  the 
allocations  and  dual  variables  in  core. 

Both  the  lower  bound  routine  and  the  upper  bound  routine  use 
M00FL0  as  the  optimizer  for  the  single  commodity  subproblems.  MODFLO 
uses  the  same  partial  pricing  scheme  as  NETFLO  and  drives  the  flow  on 
artificial  arcs  to  zero  using  the  Big-M  method.  The  Big-M  value  that 
was  used  for  our  test  problems,  except  as  noted  in  Table  4.1,  was  7 
times  the  largest  unit  cost  in  the  given  problem.  At  subsequent 
iterations,  initial  bases  for  each  commodity  are  just  the  optimal  bases 
for  the  previous  set  of  Lagrange  multipliers.  A  basis  for  the  upper- 
bound  problem  is  generated  by  constructing  a  feasible  basis  from  the 
previous  optimal  basis  using  the  rules  described  in  [1], 

In  practice  we  did  not  update  the  multipliers  for  the  step  sizes 
(  ^  in  Algorithm  3.2  and  in  Algorithm  3.5)  at  every  iteration,  but 
onl\  when  the  improvement  in  the  objective  function  was  too  small.  As 
Algorithm,  3.2  requires  a  finitie  upper  bound  (for  calculation  of  the 
step  size  in  step  2)  we  used  an  initial  value  of  UB  1.1  ♦LB . 

Thereafter  for  UB  we  used  the  best  upper  bound  generated  so  far.  The 
parameters  and  tolerance  used  in  all  our  testing  were  these: 
c  =  .90 
NOLB  =  5 

NOUB  =  5 


Pricing  Tolerance  =  1.E-2 


4.2  Description  of  the  Test  Problems 

The  multicommodity  network  problem  generator,  M\ETG\,  was  used  to 
create  22  random  test  problems.  We  modified  the  MNETGN  output  so  that 
every  arc  appeared  in  every  commodity's  subproblem  by  adding  arcs  with 
upper  bounds  of  zero  where  necessary.  The  test  problem  ranged  in  size 
from  22  to  734  nodes  and  from  53  to  1,102  arcs  with  from  0  to  599 
linking  constraints  and  from  3  to  20  commodities.  The  equivalent  1  : 
sizes  are  between  232  and  6,904  rows  and  between  470  and  12,111  colu" 

The  number  of  linking  constraints  corresponds  to  a  wide  variety  of 
problems  from  pure  network  problems  (no  linking  constraints)  to  problems 
in  which  over  75*  of  the  arcs  are  included  in  linking  constraints. 

Problem  15  was  provided  by  Lt.  Col.  Dennis  McLain,  the  Assistant 
Director  of  Operations  Research  at  the  Military  Airlift  Command  located 
at  Scott  Air  force  Base. 


4.3  Summary  or  Computational  Results 

- i -  0 

Ail  the  testing  (except  for  Problems  15,  21,  and  23)  was  don<--  on  a 
CDS  6600  at  Southern  Methodist  University,  using  the  FTN  compiler  with 
the  optimization  feature  enabled.  Cxcept  for  Problems  7  and  23,  a 

0 

guaranteed  t -optimum  was  obtained  for  each  problem  with  c  _>  90*. 

Problem  7  experienced  convergence  difficulties  when  run  using  EVA2. 

Problem  8  was  created  from  Problem  7  by  increasing  the  linking 
constraint  bounds  by  10*.  As  indicated  in  Table  4.1,  this  slight 
modification  enabled  EVAC  to  solve  the  problem  easily.  We  limited  the 
number  of  lower  bound  iterations  and  upper  bounds  iterations  to  100. 

0 


0 


even  though  Problem  7  had  not  achieved  90*  optimality  b>  that  point. 
Because  of  this  the  solution  times  for  Problem  7  are  given  in  Table  4.1 
but  are  not  included  in  the  summary  data. 

Problem  23  was  created  to  allow  us  to  test  EVAC  on  a  relatively 
large  problem.  This  problem  (with  8,904  LP  rows  and  12,111  LP  columns) 
was  too  large  for  MCNF  to  solve  in  the  available  memory,  so  we  were  not 
able  to  compare  solution  times  for  the  two  codes  on  this  problem.  In 
addition,  due  to  the  memory  limitations  on  the  CDC  6600,  we  were  forced 
to  use  a  CDC  203  to  test  Problem  23.  For  this  reason  the  times  for 
Problem  23  are  included  in  Tables  4.1  and  4.2,  but  are  not  included  in 
the  totals  and  summary  information.  Since  the  testing  on  the  CDC  205 
involved  a  real-dollar  expense,  we  were  satisfied  to  stop  when  a  75* 
optimum  was  attained.  The  test  rjns  for  Problems  15  and  21  were  made  o- 
a  CDC  Cyber  73.  But  since  both  the  EVAC  and  MCNF  runs  for  these 
problems  were  made  on  the  Cyber  73,  the  totals  and  summary  data  include 
the  times  for  Problems  15  and  21. 

Details  of  the  test  problems  are  given  in  Table  4.1.  The  times  are 
in  CPU  seconds  and  exclude  the  time  required  to  input  the  problem:  data 
and  print  the  solution  reports.  Table  4.1  also  presents  a  comparison  or 
the  times  required  For  MCNF  and  EVAC  to  solve  each  problem.  In  order  to 
present  a  meaningful  comparison  of  the  solution  times  for  MCNF  and  EVAC, 
we  also  present  the  solution  times  for  EVAC  exclusive  of  the  extra  I/O 
required  to  maintain  the  costs,  bounds,  and  old  bases  for  the  sub¬ 
problems  on  peripheral  storage.  Since  MCNF  maintains  all  this  informa¬ 
tion  in  main  memory,  this  seems  to  be  the  most  reasonable  way  of 
comparing  timing  statistics.  The  column  titled  "Guaranteed  %  Optimal" 
gives  the  best  lower  bound  generated  by  EVAC  as  a  percent  of  the  best 


upper  bound  generated  by  EVAC.  Tne  column  titled  "Actual  %  Optimal" 
presents  the  actual  optimal  objective  (as  obtained  by  MONT)  as  a  percent 
of  the  best  upper  bound  generated  by  EVAC. 

Table  4.2  provides  the  details  of  the  times  required  by  EVAC  to 
perform  various  steps  of  the  algorithm.  The  column  titled  "%  of  Time  in 
Other"  for  the  lower  bound  computations  shows  the  time  required  for  such 
activities  as  computing  the  Lagrange  multipliers,  updating  the  unit 
costs  to  reflect  these  changes,  computing  the  resulting  dual  variables, 
and  various  bookkeeping  activities.  The  corresponding  column  for  upper 
bound  computations  reflects  such  activities  as  calculating  the  dual 
variables,  testing  the  termination  criteria,  and  various  other  short 
computations. 

Table  4.3  summarizes  the  time  comparisons  graphical!).  The 
problems  are  grouped  by  number  o'"  commodities,  as  they  are  m  Tables  4.1 
and  4.2. 

4.4  Analysis  or  Results 

It  seems  clear  from  Tables  4.1  and  4.3  that  EVAC  severely 
dominates  MCNE  whenever  the  number  of  commodities  is  small.  This  is  due 
to  the  fact  that,  for  EVAC,  quite  a  bit  of  additional  overhead  is 
involved  in  alternating  between  commodities.  This  overhead  is  not  just 
a  result  of  I/O,  although  that  is  a  great  deal  of  it,  but  is  also  due  to 
the  set-up  time  required  for  activities  such  as  constructing  a  new 
feasible  basis  from  an  old  basis  and  calculating  the  resulting  dual 
variables.  MCNE,  on  the  other  hand,  is  primarily  driven  by  the  number 
of  binding  linking  constraints  in  the  optimal  solution.  This  is  because 
MCNE  seeks  an  exact  optimum. 
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Letting  1 .  EVAC  1  denote  the  sverag-  ?  im*-  reu.:--'  t  .  ~ 

'  exclusive  of  I/O',  and  T  MCNF  denote  the  avera-jf  *  .<*>••  rtg./t"-  •. 
MCNF,  we  can  express  the  following  relationships: 
for  the  3-commodity  test  problems, 

T(EVAC)  =  .3 54  *  T I MCNF;. 
for  the  4-commoditv  test  problems, 

T  1 E VAC )  =  .469  *  KMCNF). 
for  the  5-commodity  test  problems, 

t:evao  =  .666  *  t:mcnd. 

And  for  the  test  problems  with  6  or  more  commodities, 

T:EVAC:  =  .975  *  TJMCNF). 

It  should  also  be  noted  that  EVAC  is  capable  of  solving  larger 
problems  than  MC\F.  This  is  due  to  the  fact  that  EVAC  stores  only  one 
copy  of  the  network  defining  data  in  main  memory,  where  MCNF  requires 
one  copy  for  each  commodity.  Also,  EVAC  maintains  in  main  memory  the 
current  basis,  cost  and  bound  data  for  only  one  commodity  at  a  time. 
Thus,  for  a  K-commodity  problem,  EVAC  uses  on  the  order  of  1/K  the  main 
memory  required  by  MCNF. 

Note  that  the  entries  in  the  "Guaranteed  So  Optimal"  and  "Actual  % 
Optimal"  columns  of  Table  4.1  are  quite  close.  This  indicates  that  the 
sequence  of  lower  bounds  converged  to  values  very  near  optimality.  In 
addition,  from  Table  4.2,  we  see  that  the  lower  bound  iterations  are 
typically  less  time  consuming  than  the  upper  bound  iterations. 

It  is  worth  observing  that  EVAC  was  designed  for  very  large 
problems  which  would  never  be  solved  to  optimality.  Even  if  a  problem 
does  not  converge  to  within  the  requested  tolerance  in  a  prescribed 
number  of  iterations,  EVAC  always  provides  a  feasible  solution  which  is 


a  guaranteed  e-optimjm  for  some c  >3.  In  contrast,  MCNT  provides  onl>  an 
upper  bound  on  the  optimum  objective  value,  with  no  indication  of  how 
close  it  is  to  optimality  until  an  exact  optimum  is  actually  attained. 

We  conclude  that  EVAC  works  extremely  well  in  obtaining  a 
guaranteed  e-optimum  for  the  multicommodity  network  flow  problem.  While 
it  is  not  as  "robust"  as  the  simplex-based  MCNf,  it  is  a  good  choice  for 
the  class  of  problems  for  which  it  was  developed,  the  very  large 
casualty  evacuation  models. 
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CHAPTER  V 


SUMMARY  AND  CONCLUSIONS 


This  chapter  presents  a  summary  of  the  results  reported  in 
Chapter  IV  and  shares  conclusions  regarding  the  relative  effectiveness 
of  our  technique.  It  also  includes  ideas  for  further  investigation  in 
the  area. 

5.1  Summary  and  Conclusions 

Algorithm  3.6  describes  our  technique  for  finding  an  e-optimal 
solution  for  the  multicommodity  network  flow  problem.  Our  technique 
differs  from  other  approaches  to  the  problem  in  that,  rather  than 
solving  the  multicommodity  problem  directly,  we  compute  sequences  of 
lower  and  upper  bounds  on  the  optimal  objective  function  value, 
terminating  when  the  bounds  are  within  a  prescribed  tolerance.  Both 
the  lower  and  upper  bound  algorithms  use  a  subgradient  optimization 
technique  and  both  decompose  on  commodities  so  that  only  a  single 
commodity  minimum  cost  network  flow  optimizer  is  required.  At  each 
iteration  of  the  lower  bound  routine  (Algorithm  3.2),  an  initial  basis 
is  generated  from  the  previous  optimal  basis  by  modifying  the  costs  to 
correspond  to  the  new  Lagrange  multipliers,  and  updating  the  dual 
variables.  At  each  iteration  of  the  upper  bound  routine  (Algorithm 
3.5),  an  initial  basis  is  constructed  from  the  previous  optimal  basis 


using  the  rules  described  in  [1]  to  restore  feasibility  (if 
necessary)  after  changing  the  bounds  to  correspond  to  the  new 
allocations. 


The  subgradients  for  the  lower  bounds  are  computed  to  be  the  sum 
of  the  flows  on  the  mutually  constrained  arcs  minus  the  associated 
mutual  arc  capacities.  For  the  upper  bounds,  subgradients  are 
computed  using  the  dual  variables  obtained  when  solving  the  single 
commodity  network  problems. 

Our  computational  work  included  solving  each  one  of  23  problems 
twice;  once  using  MCNIF,  a  primal  partitioning  code,  and  once  using 
EVAC,  our  implementation  of  Algorithm  3.6.  On  the  average  EVAC 
required  only  65«  of  the  time  required  by  MCNr  (ignoring  I/O).  EVAC's 
performance  was  far  superior  on  the  problems  with  fewer  commodities 
and  was  not  as  impressive  on  the  problems  involving  many  commodities. 
In  addition  EVAC  required  on  the  order  1/K  the  amount  of  main  memory 
as  MCNF  for  a  K-commodity  problem. 

3.2  Areas  for  Future  Investigation 

Algorithm  3.6  involves  two  more  or  less  independent  processes. 
That  is,  there  is  no  reason  why  the  lower  bound  generator  (Algorithm 
3.2)  and  the  upper  bound  generator  (Algorithm  3.3)  could  not  proceed 
independently,  stopping  now  and  then  to  exchange  their  best  bounds  and 
test  for  optimality.  Hence  it  appears  that  this  procedure  is 
well-suited  to  exploit  the  benefits  of  a  parallel  processing 
environment.  In  addition  to  the  partitioning  of  the  technique  into 
two  separate  procedures,  within  each  of  these  procedures  the 
decomposition  by  commodities  could  take  advantage  of  a  parallel 


processing  scheme  as  well.  It  would  seem  reasonable  to  expect  such  a 
scheme  to  speed  up  the  execution  time  considerably,  especially  when 
solving  a  very  large  problem. 

There  is  also  room  for  additional  experimentation  with  the  step 
sizes,  specifically  with  the  multipliers  on  the  step  sizes.  Perhaps  a 
scheme  in  which  the  multipliers  were  allowed  to  be  reset  to  their 
starting  values  a  finite  number  of  times  would  speed  up  convergence. 
One  might  reset  these  multipliers  whenever  the  improvement  in  the 
sequence  of  upper  (lower)  bounds  fell  below  some  tolerance.  This 
would  have  the  effect  of  restarting  the  algorithm  at  that  point,  but 
with  a  far  better  "starting  solution". 

In  addition  this  problem  has  a  multiperiod  structure.  Since  the 
network  is  replicated  for  60  one  day  time  periods,  it  might  be 
advantageous  to  exploit  this  structure  using  a  forward  simplex 
approach. 
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CHAPTER  2 


Networks  with  Side  Constraints: 
An  LU  Factorization  Update 


Richard  S.  Barr,  Ksyvan  Farhangian,  Jeffery  L.  Kennington 


An  important  class  ot  mathematical  programming  models  which  are  fre¬ 
quently  used  in  logistics  studies  is  the  model  of  a  network  problem  having 
additional  linear  constraints  A  specialization  of  the  primal  simplex  algorithm 
which  exploits  the  network  structure  can  be  applied  to  this  problem  class  This 
specialization  maintains  the  basis  as  a  rooted  spanning  tree  and  a  general 
matrix  called  the  working  basis  This  paper  presents  the  algorithms  which  may 
be  used  to  maintain  the  inverse  of  this  working  basis  as  an  LU  factorization, 
which  is  the  industry  standard  for  general  linear  programming  software  Our 
specialized  code  exploits  not  only  the  network  structure  but  also  the  sparsity 
characteristics  of  the  working  basis  Computational  experimentation  indicates 
that  our  LU  implementation  results  in  a  50  percent  savings  in  the  non-zero 
elements  in  the  eta  file,  and  our  computer  codes  are  approximately  twice  as  fast 
as  MINOS  and  XMP  on  a  set  of  randomly  generated  multicommodity  network 
(low  problems 
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Good  software  for  solving  linear  programming  models  is  one  of  the 
most  important  tools  available  to  the  logistics  engineer.  For  logistics  stud¬ 
ies,  these  linear  programs  frequently  involve  a  very  large  network  of 
nodes  and  arcs,  which  may  be  duplicated  by  time  period  For  example, 
nodes  may  represent  given  cities  at  a  particular  point  in  time  while  arcs 
represent  roads,  railways,  and  legs  of  flights  connecting  these  cities 
Some  nodes  are  designated  as  supply  nodes,  others  demand  nodes, 
while  some  may  simply  represent  points  of  transshipment  The  mathemat¬ 
ical  model  characterizes  a  solution  such  that  the  supply  is  shipped  to  the 
demand  nodes  at  least  cost  while  not  violating  either  the  upper  or  lower 
bounds  on  the  flow  over  an  arc. 

If  the  mam  structure  of  a  logistics  problem  can  be  captured  in  a  net¬ 
work  model,  then  the  size  of  solvable  problems  becomes  enormous 
Hence,  more  realistic  situations  can  be  modelled  that  would  otherwise  lie 
outside  the  domain  of  general  linear  programming  techniques  For  exam¬ 
ple,  one  current  logistics  planning  model  involves  200  nodes  and  (365 
days/yr)  (30  years)  =  10,950  time  periods  to  give  over  2,000,000  con¬ 
straints  Network  problems  having  20,000  constraints  and  20,000,000 
variables  are  solved  routinely  at  the  U.  S  Treasury  Department 

Unfortunately,  the  pure  network  structure  may  require  simplification  of 
the  problem  to  the  point  that  key  policy  restrictions  must  be  omitted  The 
work  presented  in  this  study  builds  upon  existing  large-scale  network 
solution  technology  to  allow  for  the  inclusion  of  arbitrary  additional  con¬ 
straints.  Typical  constraints  include  capacities  on  vehicles  carrying  differ¬ 
ent  types  of  goods,  restrictions  on  the  total  number  of  vehicles  available 
for  assignment,  and  budget  restrictions  The  addition  of  even  a  few  non¬ 
network  constraints  can  greatly  enhance  the  realism  and  usability  of 
these  models  Our  approach  exploits— to  as  great  an  extent  as  possible— 
the  traditional  network  portion  of  the  problem  while  simultaneously  en¬ 
forcing  any  additional  restrictions  imposed  by  the  practitioner 

For  general  linear  programming  systems,  the  most  important  compo¬ 
nent  is  the  algorithm  used  to  update  the  basis  inverse  Due  to  the  excel¬ 
lent  sparcity  and  numerical  stability  characteristics,  an  LU  factorization 
with  either  a  Bartels-Golub  or  Forrest-Tomlm  update  has  been  adopted 
for  modern  linear  programming  systems  For  pure  network  problems,  the 
basis  is  always  triangular  and  corresponds  to  a  rooted  spanning  tree  The 
modern  network  codes  which  exploit  this  structure  have  been  found  to  be 
from  one  to  two  orders  of  magnitude  faster  than  the  general  linear  pro¬ 
gramming  systems  In  this  paper,  we  have  combined  these  two  powerful 
techniques  into  an  algorithm  for  solving  network  models  having  additional 
side  constraint^ 

Let  A  be  an  m  x  n  matrix,  let  c  and  u  be  n-component  vectors,  and  let 
b  be  an  m-component  vector  Without  loss  of  generality,  the  linear  pro¬ 
gram  may  be  stated  mathematically  as  follows 
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minimize 

cx 

(D 

subject  to 

Ax  =  b 

(2) 

Osxsu. 

(3) 

The  network  with  side  constraint  modal  is  a  special  case  of  (1 )— (3)  in 


which  A  takes  the  form 


}  n 


where  M  is  a  node-arc  incidence  matrix 

If  m  =  0,  then  (1)  -  (3)  is  a  pure  network  problem. 


1.1  Applications 

There  are  numerous  applications  of  the  network  with  side  constraint 
model  Professor  Glover  and  his  colleagues  have  solved  a  large  pas¬ 
senger-mix  model  for  Frontier  Airlines  and  a  large  land  management 
model  for  the  Bureau  of  Land  Management  (see  (7,  8]).  A  world  grain 
export  model  has  been  solved  to  help  analyze  the  port  capacity  of  U  S 
ports  during  the  next  decade  (see  [2])  A  cargo  routing  model  is  being 
used  by  the  Air  Force  Logistics  Command  to  assist  in  routing  cargo 
planes  for  the  distribution  of  serviceable  spares  (see  [1])  Lt  Col  Dennis 
McLain,  has  developed  a  large  model  to  assist  in  the  development  of  a 
casualty  evacuation  plan  in  the  event  of  a  European  conflict  (see  [14])  A 
Nationai  Forest  Management  Model  has  been  developed  to  aid  forest 
managers  in  long  term  planning  for  national  forests  (see  [10])  In  addition, 
work  is  currently  underway  which  attempts  to  convert  general  linear  pro¬ 
grams  into  the  network  with  side  constraint  model  (see  [4,  16]) 


1.2  Objective  of  Investigation 

Due  to  both  storage  and  time  considerations,  the  basis  inverse  is  main¬ 
tained  as  an  LU  factorization  in  modern  LP  software  (see  [3,  5,  15]).  The 
objective  of  this  investigation  is  to  extend  these  ideas  to  the  primal  parti¬ 
tioning  algorithm  when  applied  to  the  network  with  side  constraints 
model 


1.3  Notation 

The  i,hcomponent  of  the  vector  a  will  be  denoted  by  a,  The  (i,j)'h  ele¬ 
ment  of  the  matrix  A  is  denoted  by  A,,.  A(i)  and  A[i]  denotes  the  i,f1  column 
and  i,h  row  of  the  matrix  A,  respectively.  0  denotes  a  vector  of  zeroes,  1 
denotes  a  vector  of  ones,  and  aK  denotes  a  vector  with  a  1  in  the  k'h 
position  and  zeroes  elsewhere  Sigma  is  used  to  denote  the  scalar  sig- 
num  function  defined  by 


o<  y)  = 


1 .  if  y  >  0 
0,  if  y  =  0 
-1.  if  y  <  0 


The  identity  matrix  is  given  by  ‘T. 


II.  THE  PRIMAL  SIMPLEX  ALGORITHM 


We  assume  that  A  has  full  row  rank  and  that  there  exist  a  feasible 
solution  for  (1 )— (3).  Given  a  basic  feasible  solution,  we  may  partition  A,  c, 
x,  and  u  into  basic  and  nonbasic  components,  that  is,  A  =  (BjN],  c  = 
[cBjcN],  x  =  [xBjxN],  and  u  =  [uBjuN].  Using  the  above  partitioning,  the 
primal  simplex  algorithm  may  be  stated  as  follows: 


PRIMAL  SIMPLEX  ALGORITHM 
0.  Initialization.  Let  [xBjxN]  be  a  basic  feasible  solution. 

1.  Pricing.  Let  n  =  cBB“ ’.  Define 

i/r,  =  {i:x*  =  0  and  it  N(i)  >  cf4}, 
fa  =  {i : x?  =  u“  and  n  N(i)  <  cf4}. 

If  i/»,  u  ih  =  <P.  terminate  with  (xB>xN]  optimal;  otherwise,  select  k  £ 
t/»,  u  i /»,  and  set  8  «-  1  if  k  e  and  6  * — 1 ,  otherwise 
2  Ratio  Test.  Set  y  ♦-  B~'N(k).  Set 


min  f  X.B  1 


mm  (  u?  —  xf  1 

a2  «-  -tKy.)  =  x) 


Set  A  «-  mm  {A,,  A2,  U*}. 

If  A  #  «.  then  go  to  3,  otherwise,  terminate  with  the  conclusion  that  the 
problem  is  unbounded 

3  Update  Values.  Set  x*  «-  x*  +  A5  and  xB  «-  xB  -  Afiy.  If  A  =  u" .  re¬ 
turn  to  step  1 . 

4  Update  Basis  Inverse.  Let 


<h  =  {j:xf  =  0  and  cr<y()  =  «} 


Ip*  =  {j:xf  =  uf  and  -cr<y,)  =  5}. 


Select  any  (  e  fa  u  <p*  In  the  basis,  replace  B(f)  with  N(k),  update  the 
inverse  of  the  new  basis,  and  return  to  step  1 . 


-v  y- 
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THE  PARTITIONED  BASIS 


The  network  with  side  constraint  model  may  be  stated  as  follows: 


minimize 


c  V  +  c2x2 


subject  to; 


Mx  =  b’ 


Sx1  +  Px2  =  b2 
0  £  X1  £  U1 
0  —  x2  —  u2. 


We  may  assume  without  loss  of  generality  that, 

(i)  The  graph  associated  with  M  has  n  nodes  and  is  connected  (i  e., 
there  exists  an  undirected  path  between  every  pair  of  nodes) 

(ii)  [S|P]  has  full  row  rank  (i.e. ,  rank  [SjP]  =  m). 

(in)  Total  supply  equals  total  demand  (i  e. ,  1b1  =  0). 

Since  the  rank  of  system  (5)  is  one  less  than  the  number  of  rows,  we 
add  what  has  been  called  the  root  arc  to  (5)  to  obtain 

Mx1  +  ®pa  =  b1 


where  0<as0  and  1<p<n 

Then  the  constraint  matrix  for  the  network  with  side  constraints  model 
becomes 


and  A  has  full  row  rank 


I'M  i  i  epl 
=  — + — + — 

LS  !  P  !  J 


It  is  well-known  that  every  basis  for  A  may  be  placed  in  the  form 


„  T  i  C 

0  = - 1 - 

D  1  F 


where  T  corresponds  to  a  rooted  spanning  tree  and 


T"1  +  T_1CQ_1DT_1  i  -T-’CQ-1  1 

8  -  l - TQ-5f--r- - { - -0---r - J 


where  Q  =  F  -  DT  ’C  The  objective  of  this  paper  is  to  give  algorithms 
which  maintain  CT1  as  an  LU  factorization. 


IV,  THE  INVERSE  UPDATE 


Recall  that  the  partitioned  basis  takes  the  form 

key  nonkey 


L  D  !  F  J 


%  •-  *•  A 


•Tvs 


t  !»»  *■* 


and  let 


B  =  BL 


The  inverse  update  requires  a  technique  for  obtaining  a  new  CT1  after  a 
basis  exchange  Let  B„  L„  B,,  and  Q,  denote  the  above  matrices  at  itera¬ 
tion  i.  Then  we  want  an  expression  for  0,+’,  in  terms  of  Q,-1  The  transfor¬ 
mation  takes  the  form 


B.V, -EBr1  (11) 

where  E  is  either  an  elementary  column  matrix  or  a  permutation  matrix 
Let  E  be  partitioned  to  be  compatible  with  B  That  is, 


E  =  [AlI  Jkl 
LE3  !  eJ  . 


By  examining  the  (2,2)  partition  of  B.tV  we  obtain 

Q.V,  =  (E4-E3T-,C)Qr1  (12) 

In  determining  the  updating  formulae,  we  must  examine  two  major 
cases  with  subcases 

Case  1.  The  leaving  column  is  nonkey.  For  this  case,  E  takes  the  form 


1  i  _E£_  ' 
-A  E 


and  (12)  reduces  to  Q,Ti  =  E«Qr\ 

Case  2.  The  leaving  column  is  key. 

Let  y  =  &  T_1  C  If  #  0,  then  the  km  column  of  C  can  be  interchanged 
with  the  jm  column  of  T  and  the  new  T  will  be  nonsingular. 

Subcase  2a  y  *  0  Suppose  yk  *  0 
Then  E4  -  E3  T"’  C  reduces  to 


-eT-’C 


Qr,1,  =  RQrV  Case  1  is  applied  to  complete  the  update 

Subcase  2b  y  =  0  For  this  case  no  interchange  is  possible,  the  entering 

column  becomes  key,  and  Qr+’i  =  Q,_1. 
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V.  AN  LU  UPDATE 


, !  r 

-  !  i 

— 

0 

!  i 

l_  ..  _ 

0 

1 

1  i  0 

i 

L'  = 

r 

t, 

— i 
i 
i 

0  i 

i 

1 

( 

i 

i 

i 

L 

1 

1 

Matrices  of  the  form  given  by  U'  and  t'  are  called  upper  etas  and  lower 
etas,  respectively  Suppose  we  have  a  factorization  of  Q-1  in  the  form 

CT 1  =  U’U2  .  .  .  umF8Fs_1  .  .  .  F\  (14) 

where  F1 . F5  are  a  combination  of  row  and  column  etas  The  right 

side  of  (14)  is  referred  to  as  the  eta  file  where  only  the  non-identity  rows 
and  columns  are  stored  Suppose  that  the  km  column  of  Q  is  replaced  by 
Q(k)  to  form  the  new  m  by  m  working  basis  6  This  section  presentr 
algorithms  which  may  be  used  to  update  (14)  to  produce  5' 1  in  the  same 
form 


5.1  Nonkey  Column  Leaves  The  Basis 

If  k  =  m,  then  let  p  =  Fs  .  .  F’S(k),  let 


and  let 


1 

1  i 

••  i 

• 

1 

_ 

3m-  1 

L  '1  J 
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We  will  show  that  Q'1  =  U1  .  .  .  um-,jjmLmP  .  .  .  F1. 
If  k  <  m.  then  let  Rk  =  I  and 


Q-1  =  U1  .  .  .  UkRkUk+1  .  .  .  UmP  .  .  .  F1. 

(15) 

We  next  define  a  new  upper  eta,  Uk,  and  a  new  row  eta,  Rk+1, 

such  that 

* 

R  kuk+1  =  UkRk+1. 

(16) 

Substituting  (16)  into  (15)  yields 

Q_1  =  u1  .  .  .  UkUkRk+,Uk+2  .  .  .  ITP  .  .  .  F1. 

(17) 

We  again  define  two  new  eta  s,  Uk+’  and  R1"*2,  such  that 

/ 

Rk+ljjk  +  2 

(18) 

Substituting  (18)  into  (17)  yields 

CT1  =  U1  .  UkUkOk+1Rk+2Uk+3  .  .  .  UmP  .  .  F1. 

Repeating  this  process  eventually  yields 

Q~’  =  U1  UkGk  .  .  .  Qm-1RmP  .  .  .  F1. 

(19) 

Let  y  -  RmFs  .  .  F'Q(k),  let 

Lm  = 


l/w 


-yJy* 


and  let 


LT  = 


“Yi 

“Yk-i 


1 


Then  UrnLnny  =  «k  and  we  will  show  that  Q  1  =  U1  ...  Uk  '1Uk  . 
UmL,T'R'T'Fs  .  .  F1. 

We  now  present  the  algorithm  which  updates  the  LU  representation  of 
Q~’  when  the  leaving  column  is  nonkey  Assume  that  Q(k)  is  replacing 
Q(k)  in  the  working  basis 
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ALG  1:  LU  UPDATE  FOR  NONKEY  LEAVING  COLUMN 
1  Set  0  «-  Fs  .  .  .  F’Q(k). 

2.  If  k  m,  set  t  *-  k,  R‘  —  I,  go  to  4. 

3.  Set  Lm «- 1.  where  I  is  m  by  m. 

Set  L^m «-  1/0m. 

Set  Um  *-  I,  where  I  is  m  by  m. 

Set  U"  « — 0t,  for  1  ^  j  <  m. 

Stop  with  Q-’  =  U1  .  .  .  um-'UmLmFs  .  .  .  F1. 


4  Set 

a  «— 

R*[k]U' 

Set 

R<  +  1 

«-R‘. 

Set 

nk/i 

t’-a 

Set 

u‘  —  u<+1. 

Set 

Ui,H 

n-0 

(R'U'+1 

=  0‘R* 

Set 

t  — 

f+1. 

5  If  t  <  m,  go  to  4 

(Uk+1  .  .  .  Um  =  Uk  .  .  .  Um_1Rm.) 

Set  0  —  Rm0 

6  Set  Lm  *-  I,  where  I  is  m  by  m. 

Set  Lft  -  1/0k 

Set  LJT  * — 0/A,,  for  k  <  j  s  m. 

Set  Um *- 1,  where  I  is  m  by  m. 

Set  U[T« - A,  for  1  <  |  <  k 

Set  U^-l.  ^  „ 

Stop  with  Q-'  =  U’  .  .  .  Uk-,UkUk  +  1  .  .  .  UmLmRmFs  .  .  .  F\ 

We  now  present  the  justification  for  step  3  of  ALG  1  For  k  =  m,  we 
claim  that  CT 1  =  U'  .  .  um'1Dfr’LmFs  .  FV  Note  that  6"’  Q(m)  = 
U’  .  .  LT_1UmLm0  But  by  construction  UmLm0  =  em  Consider 

Proposition  1. 

Let  0  be  any  m-vector  and  E‘  be  any  column  eta  If  0,  =  0^  then  E'0  =  0 
By  Proposition  1,  U1  .  .  Um_,em  =  em  Therefore,  Cr’Q(m)  =  em  For 
1  s  t  <  m,  let  y  =  Fs  .  .  F1  Q(€)  By  construction  y,  =  0  for  (  <  j  s  m  and 
y,  =  1.  By  Proposition  1,  U' +  l  .  .  ,  Um_1UmLmy  =  y  By  the  construction 
of  U'.  .  U*,  we  have  U1  .  .  .  U'y  =  e*  Therefore,  if  the  leaving  column  is 
Q(m),  then  step  3  of  ALG  1  produces  6'1. 

We  now  present  a  theoretical  justification  for  step  4  of  ALG  1 . 


Proposition  2. 

Let 
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up+1  = 

r  i  i 

— i 
i 
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j 

and  Rp  = 

['  !  1 
-i  -  . 

I  3 
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|  1 
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i  i 

L  1  J 

t 

column  ( 


where  t  *  V. 
If 


h  !  !  1 

n  ; 

0P  = 

1  . 

1 

--L. 

IP 

and  Rp*’  = 

i 

L  i  r<] 

[""1  Tj 

T 


column  ( 

where 


=  |0,  if  i  =  r 

a'  ln„  otherwise,  and 

Hny,rf 

otherwise, 

then  RpUp't  1  =  UpRp4\ 

Proposition  2  is  a  theoretical  justification  for  step  4  of  ALG  1  The  propo¬ 
sition  to  follow  shows  the  precise  structure  of  RmFs  .  .  F'Q  Consider 


Proposition  3. 

Let  U*  =  Fs  .  .  .  F'Q  If  U*  =  RmU*,  then 

u*i,i={u;[,]^k 

l*  ,  otherwise 

We  now  present  the  results  to  prove  that  Q"1  =  U1  .  .  Uk_1Uk 
UmLmRmFs  .  F\ 

Proposition  4. 

U1  .  .  .  Uk'1UK  .  .  .  UmLmRmFs  .  F’Q(k)  =  s'1 

Proposition  5. 

u’  ,  ,  Uk~’Uk  .  UmLmRmFs  .  .  F'Q(i)  =  s'  for  i  *  k 
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By  Propositions  4  and  5,  we  have 

Corollary  6. 

6"’  =  U’  .  .  .  Uk~’Gk  .  .  UmLr,’RmFs  .  .  .  F'. 

Hence.  ALG  1  produces  the  updated  working  basis  inverse. 

5.2  Key  Column  Leaves  The  Basis 

In  this  section,  we  present  an  algorithm  tor  updating  the  working  basis 
inverse  to  accomplish  a  switch  between  a  key  column  and  a  nonkey 
column  That  is,  Q  =  RCV'where  R  is  given  by  (13)  and 

Q-’  =  U1  UmFs  .  FV  (20) 

We  wish  to  obtain  Q'1  in  the  same  form  as  (20) 

To  accomplish  this  update,  we  begin  with  CT 1  =  RU’  UmFV  .  .  F’ 
We  apply  Proposition  2  to  RU’  creating  the  factorization  G*’  =  U’R2U? 

UmFs .  .  .  F' .  We  continue  with  the  application  of  Proposition  2  until  we 
obtain  =  U1  .  .  .  Uk~'RkUk  .  .  UmFs .  .  .  F*.  Proposition  2  does  not 
apply  to  RKU\  However,  a  simple  update  would  be  to  let  Uh  =  . 

=  Um  =  I  and  use  the  below  factorization: 

Q-1  ^  O’  Dm  RkUk  U™FS  F’  . 

LEFT  FILE  RIGHT  FILE 

This  update  simply  involves  application  of  Proposition  2  until  it  does  not 
apply  (<  =  f )  and  then  shifting  the  remainder  of  the  left  file  to  the  right 
file  We  call  this  update  the  TYPE  1  UPDATE 
We  will  now  give  an  update  in  which  RhUk  .  .  .  Um  is  modified  as  op¬ 
posed  to  moving  them  to  the  right  file  Let 


Then  we  define  matrices  Uk  +  1  and  Ek"’such  that  EkUk->1  =  0k'*’Ek'f’ 
Following  this  procedure,  RkUk  .  .  Um  can  be  replaced  by  Uk+’ 
UmEm”  so  that 

6"’  =  0’  Uk_’Uk+1  .  U">Em+ips  f1 

Further,  we  define  a  row  eta  R  and  a  column  eta  F  such  that  Em" '  =  RF 
Therefore, 

Gr1  =  0’  uk-’Dk4’  umRFF!>  fv 

LEFT  FILE  RIGHT  FILE 


”,  V 
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Proposition  8. 

Let 


where  X  and  Y  are  such  that 


XY  =  y,  -  Z  YM 

i*  ( 

then  E  =  RF 

We  now  present  the  update  algorithm  for  the  case  in  which  the  (1h  column 
of  T  is  being  switched  with  the  k"  column  of  C  Let  y  =  e'T-  ’C 

ALG  2  LU  UPDATE  FOR  A  KEY  LEAVING  COLUMN 

1  Set  R‘  —  I 
Set  R  ( k]  «—  y 
Set  i  *—  1 

2  If  i  =  k,  go  to  4 
Set  a  *—  R  [k]U  (i) 

Set  R  * 1  «-  R 
Set  R„*  ’  <—  q 
Set  U  —  U 

Set  U.  —  0 

3  Set  i «-  i  +  1  and  go  to  2 

4  Set  U"  —  l 


5.  Apply  Proposition  7  to  E'U,+ 1  to  form  U  ',1E“1 
Set  i  «—  i  +  1 . 

6.  If  i  <  m,  go  to  5 

7  Apply  Proposition  8  to  Em  to  obtain  RF  where  X  =  1^ 

At  the  completion  of  step  7  we  have  Q' 1  =  O’  .  .  .  UmRFFs  .  .  F' 

VI  COMPUTATIONAL  EXPERIMENTATION 

Three  test  problems  were  selected  for  the  experiment  Sc205  is  a  stair¬ 
case  linear  program  which  was  generated  by  Ho  and  Loute  [12]  and 
transformed  into  a  network  with  side  constraints  Gifford-Pinchot  is  a 
model  of  the  Gifford-Pmchot  National  Forest  [10]  which  has  also  been 
transformed  into  a  network  with  side  constraints  RAN  is  a  randomly  gen¬ 
erated  problem 

These  problems  were  first  solved  and  the  pivot  agenda  was  saved 
That  is.  entering  and  leaving  columns  for  each  p>vot  were  saved  on  a  file 
This  file  was  then  used  by  each  code  so  that  all  three  basis  updates  follow 
the  same  path  to  the  oplimum  The  number  of  nonzeroes  required  to 
represent  Q~'  at  various  points  in  the  solution  process  is  illustrated  in 
Figures  1  and  2  For  both  problems,  the  LU  Type  2  update  dominated 
both  the  LU  Type  1  update  and  the  product-form  code  in  terms  of 
nonzeroes  in  the  inverse  The  average  core  storage  required  for  CT1 
using  the  product-form  update  is  approximately  double  that  required  for 
the  best  LU  update 

Given  the  above  results,  we  developed  three  specialized  network  with 
side  constraints  codes  and  computationally  compared  them  with  three 
general  in-core  LP  systems  and  a  special  system  for  multicommodity 
network  flow  problems  All  codes  are  written  in  FORTRAN  and  have  not 
been  tailored  to  either  our  equipment  or  our  FORTRAN  compiler  None  of 
the  codes  were  tuned  for  our  problem  set  A  brief  description  of  each 
code  follows 

NETSIDE1,  NETSIDE2  AND  NETSIDE3  are  our  specialized  network 
with  side  constraints  systems  The  first  maintains  Q"1  in  product  form, 
while  the  second  and  third  maintain  Q' '  in  LU  form  using  a  Type  1  and 
Type  2  update,  respectively  All  use  the  Hellerman  and  Rarick  [11]  rein- 
version  routine  The  working  basis  is  reinverted  every  60  iterations  The 
pricing  routine  uses  a  candidate  list  of  size  6  with  block  size  of  200 
MINOS  [15]  stands  for  "a  Modular  In-Core  Nonlinear  Optimization  Sys¬ 
tem'  and  is  designed  to  solve  problems  of  the  following  form 

minimize  f(x)  +  cx 

subiect  to  Ax  =  b 
f<x<u 

where  f(x)  is  continuously  differentiable  in  the  feasible  region  For  this 


nonzeroes  in  Q 


Product  Form 


LU  Type  1 


LU  Type  2 


Iterations 


Figure  1.  Nonzero  Buildup  In  The  Working  Basis  Inverse  On 
SC20S  (22]. 

(317  columns,  119  nodes,  87  side  constraints) 


study  f(x)  =  0  at  all  x  and  therefore  none  of  the  nonlinear  subroutines 
were  used  for  problem  solution 

For  linear  programs,  MINOS  uses  the  revised  simplex  algorithm  with  all 
data  and  instructions  residing  in  core  storage  The  basis  inverse  is  main¬ 
tained  as  an  LU  factorization  using  a  Bartels-Golub  update  The  remver- 
sion  routine  uses  the  Hellerman-Rarick  [11]  pivot  agenda  algorithm 

XMP  is  a  library  ot  FORTRAN  subroutines  which  can  be  used  to  solve 
linear  programs  The  basis  inverse  is  maintained  in  LU  factored  form  The 
pricing  routine  uses  a  candidate  list  of  size  6  with  two  hundred  columns 
oemg  scanned  each  time  the  list  is  refreshed  The  basis  is  remverted 
every  50  iterations 

LISS  stands  for  "Linear  In-Core  Simplex  System”  and  is  an  in-core  LP 
solver  with  the  basis  inverse  maintained  in  product  form  The  reinversion 
rouhne  is  a  modification  of  the  work  of  Hellerman  and  Rarick  [11]  The 
basis  inverse  is  refactored  every  50  iterations  A  partial  pricing  scheme  is 
used  with  20  blocks 
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Figure  2.  Nonzero  Buildup  In  The  Working  Basis  Inverse  On 
Gifford  Pinchot  (20]. 

(1160  coIubts*  333  nodes,  84  side  constraints) 

MCNF  stands  for  "Multicommodity  Network  Flow"  MCNF  uses  the  pri¬ 
mal  partitioning  algorithm  also  The  basis  inverse  is  maintained  as  a  set  of 
rooted  spanning  trees  (one  for  each  commodity)  and  a  working  basis 
inverse  in  product  form  This  working  basis  inverse  has  dimension  equal 
to  the  number  of  binding  GUB  constraints  A  partial  pricing  scheme  is 
used  Our  computational  experience  is  given  in  Table  1 
The  row  entitled  GUB  Constraints,  gives  the  number  of  LP  rows  which 
correspond  to  "GUB  Constraints"  The  row,  entitled  "Binding  GUB  Con¬ 
straints",  gives  the  number  of  GUB  constants  met  as  equalities  at  opti¬ 
mality  using  MCNF,  All  runs  were  made  on  the  CDC  6600  at  Southern 
Methodist  University  using  the  FTN  compiler  with  the  optimization  feature 
enabled 


Table  1  Comparison  of  Codes  for  Solving  Multicommodity  Network  Flow  Problems 
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Based  on  these  results,  we  conclude  that  for  lightly  constrained  mul¬ 
ticommodity  network  How  problems 

(i)  XMP  and  MINOS  run  at  approximately  the  same  speed, 

(jj)  NETSIDE1,  NETSIDE2  and  NETSIDE3  run  at  approximately  the 
same  speed,  and 

(iii)  the  three  NETSIDE  codes  are  approximately  twice  as  fast  as  XMP 
and  MINOS 
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This  paper  gives  a  mathematical  programming  model  for  the  problem  of  assigning 
frequencies  to  nodes  in  a  communications  network.  The  objective  is  to  select  a 
frequency  assignment  which  minimizes  both  cochannel  and  adjacent-channel  inter¬ 
ference.  In  addition,  a  design  engineer  has  the  option  to  designate  key  links  in  which 
the  avoidance  of  jamming  due  to  self  interference  is  given  a  higher  priority.  The 
model  has  a  nonconvex  quadratic  objective  function,  generalized  upper-bounding 
constraints,  and  binary  decision  variables.  We  developed  a  special  heuristic  algorithm 
and  software  for  this  model  and  tested  it  on  five  test  problems  which  were  modi¬ 
fications  of  a  real-world  problem.  Even  though  most  of  the  test  problems  had  over 
600  binary  variables,  we  were  able  to  obtain  a  near  optimum  in  less  than  12  seconds 
of  CPU  time  on  a  CDC  Cyber-875. 


I.  INTRODUCTION 

One  of  the  most  critical  design  problems  in  a  radio  communication  network  is 
the  assignment  of  transmit  frequencies  to  stations  (nodes)  so  that  designated  key 
communication  links  will  not  be  jammed  due  to  self  interference.  In  this  inves¬ 
tigation,  we  describe  a  novel  new  optimization  model  and  a  solution  technique 
which  can  be  used  to  assist  design  engineers  in  this  process. 

1.1.  Problem  Description 

A  radio  communications  network  consists  of  radio  stations,  each  equipped  with 
one  or  more  transmitters  and  receivers.  When  a  given  station  has  the  ability  to 
receive  information  intelligibly  from  a  transmitting  station,  a  link  is  said  to  exist 
from  the  transmitting  station  to  the  receiving  station.  The  interconnection  of 
these  stations  and  links  may  be  viewed  graphically  as  a  set  of  nodes,  representing 
the  radio  stations,  joined  together  by  directed  arcs,  representing  the  links. 

We  assume  in  our  model  that  one  transmitter  and  several  receivers  are  located 
at  each  radio  station  (node).  The  transmitter  is  tuned  to  a  specified  center  fre¬ 
quency,  and  the  receivers  are  tuned  to  the  transmit  frequencies  of  the  neighboring 
stations  to  which  the  station  is  to  be  linked.  A  channel  is  associated  with  each 

'Comments  and  criticisms  from  interested  readers  are  cordially  invited. 
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center  frequency  in  a  way  similar  to  the  way  channels  and  frequencies  are  as¬ 
sociated  in  a  television  set.  When  a  TV  is  tuned  to  channel  4,  for  example,  it  is 
really  being  tuned  to  receive  video  signals  being  broadcast  at  67.25  Milz. 

For  our  model,  a  given  center  frequency  will  be  associated  with  each  channel 
number.  Using  this  definition,  the  frequency  assignment  problem  may  be  defined 
as  follows:  Given  N  transmitting  stations  (nodes),  assign  1  of  F  transmit  channels 
to  each  node  in  such  a  way  as  to  minimize  the  number  of  designated  key  links 
jammed  due  to  cochannel  and  adjacent-channel  interference.  We  say  that  a  link 
is  jammed  if  either  of  the  following  conditions  occurs:  (i)  a  node  receives  two 
signals  on  the  same  channel  that  are  less  than  a  dB  apart  in  signal  strength,  or 
(ii)  a  node  receives  a  signal  on  a  given  channel  while  a  neighboring  node  transmits 
on  an  adjacent  channel.  If  the  neighbor's  signal  strength  exceeds  the  signal 
strength  of  the  current  node  by  more  than  p  dB,  then  the  incoming  signal  will 
be  garbled.  The  constants  a  and  (3  are  functions  of  the  hardware  used  in  the 
network.  Some  of  the  determining  factors  are  the  receiver  selectivity,  the  type 
of  signal  modulation,  and  the  purity  of  the  signal. 

We  now  introduce  the  notation  used  to  describe  the  mathematical  model.  Let 
/  £  {1,  .  .  .  ,F}  denote  a  channel  and  n  £  {1,  .  .  .  ,V}  denote  a  node,  e,  will 
denote  a  vector  whose  entries  are  0  except  for  the  ith,  which  is  1.  Let  -  1  if 
channel  /  is  assigned  to  node  n  and  0  otherwise,  x f  =  the  row  vector 
.  .  .  .JtjvJ,  and  g(xu  .  .  .  ,xF)  =  a  weighted  number  of  jammed  links  with 
assignment  (x,,  .  .  .  ,xf).  Using  the  above  notation,  the  mathematical  model  of 
the  frequency  assignment  problem  is 


«(*i.  •  •  •  4r) 

(1) 

/ 

for  all  n 

(2) 

x„  €  {0,1}, 

for  all  f,n. 

(3) 

For  this  application.  g()  is  a  nonconvex  quadratic  function  and  therefore  (1)- 
(3)  are  members  of  the  class  of  binary  nonconvex  cost  nonlinear  programs. 

1.2.  Related  Literature 

A  heuristic  procedure  for  solving  a  similar  problem  using  a  graph  coloring 
algorithm  has  been  evaluated  by  Zoellner  and  Beall  (7).  Closely  related  models 
have  been  investigated  by  Morito,  Salkin ,  and  Williams  [5]  and  by  Mathur,  Salkin , 
Nishimura,  and  Morito  [4],  Their  models  are  general  linear  integer  programs 
with  a  single  constraint.  Using  a  special  branch-and-bound  algorithm,  they  suc¬ 
cessfully  solved  their  model  with  up  to  fifty  channels. 

1J.  Accomplishments  of  the  Investigation 

We  developed  a  novel  new  mathematical  model  of  the  frequency  assignment 
problem  which  takes  the  form  of  a  binary  nonconvex  quadratic  cost  nonlinear 
program .  The  model  incorporates  weighting  constants  that  allow  a  design  engineer 
to  tune  the  model  to  a  particular  application.  We  present  an  elegant  specialization 
of  the  convex  simplex  algorithm  to  obtain  a  local  optimum  for  this  model.  In 
addition,  specialized  software  has  been  developed  for  this  model  and  tested  on 
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five  versions  of  a  real-world  problem.  The  software  works  quite  well,  requiring 
less  than  a  minute  of  computer  time  for  all  five  test  problems. 


2.  THE  OBJECTIVE  FUNCTION 

In  this  section,  we  define  the  weighted  interference  function,  g(x,,  .  .  .  ,xF). 
This  function  is  generated  from  a  set  of  signal  strength  matrices,  (A^  .  .  .  ,Af ), 
two  weighting  matrices,  and  a  set  of  critical  values  a,  ^,  and  8,,  .  .  .  ,hs.  Let  a', 
denote  the  received  signal  strength  in  dBu/m  of  a  signal  which  originates  at  node 
i  and  is  received  by  node  /,  and  let  Af  denote  the  matrix  whose  elements  are  o'. 
Let  the  weighting  matrices  P  and  W'  be  determined  as  follows: 


and 


if  (/,;)  is  a  designated  key  link 
otherwise 


if  (i,j)  is  a  designated  key  link 
otherwise. 


The  constants  />,,  p2,  h,,  and  h';  are  tuning  parameters  which  are  used  to  provide 
weights  in  the  interference  function  for  the  key  links. 

Gamma  is  used  to  denote  the  scalar  function,  defined  by 


ri,  if  x  >  o, 

X  to,  otherwise. 
Using  y(  ),  we  define  the  three  matrices 
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Using  these  matrices,  the  interference  function  is  given  by 


f-F  f-F-i  f-F 

g(x„  .  .  .*r)  =  X  */  0/»/  +  2  K/*/*i  +  Z  */ 

/-I  f-  1  t-2 

/ - ~ - N  / - V  » - V 

cochannel  adjacent  channel  adjacent  channel 
interference  interference  from  interference  from 
channel  above  channel  below 
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In  addition  it  is  often  desirable  to  use  all  of  the  channels.  Therefore,  we 
appended  the  function 


i-A’-l  i-N 

S  *  2  * 
/  ■-!  /-,♦! 


to  g(  )  so  that  in  the  absence  of  self  interference,  the  channels  would  be  equally 
distributed  among  the  nodes.  The  scalar  is  also  a  tuning  parameter. 

Using  the  above  formulae,  we  now  give  an  example  which  presents  the  matrices 
required  to  define  g(-).  Let  a  =  2,  (3  =  3,  W3  =  0,  6,  =  0  for  all  n,  and  p,,  = 
tvv  =  1  for  all  ij.  If 


and 
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3.  THE  ALGORITHM 


0 

1 

0 

1 


Let  *’  =  .  j'r].  Then  the  frequency  assignment  problem  takes  the 

general  form: 


nun 


g(i)  =  i'Ci 


(4) 
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II 

for  all  n 

(5) 

xh  E  {0,1}, 

for  all  f,n 

(6) 

where  the  diagonal  elements  of  C  are  0  and  all  other  elements  are  positive.  The 
continuous  relaxation  of  (4)— (6)  is  obtained  by  replacing  (6)  with 

0  ^  Xfi,  £  1,  for  all  f,n.  (7) 

The  model  (4),  (5),  (7)  is  a  nonconvex  quadratic  program  and  a  local  optimum 
can  be  efficiently  obtained  by  application  of  the  convex  simplex  algorithm  as 
described  in  Zangwill  [6].  Suppose  we  begin  with  a  feasible  integer  solution 

S'  =  [i[ . Tf].  We  assume  that  all  nonbasic  variables  have  a  value  of  zero. 

Let/,,  .  .  .  ,ls  denote  the  subscript  such  that  5(i,  =  •••  =  x,^  =  1 .  Then  a  nonbasic 
variable  x ^  with  a  value  of  zero  prices  favorably  if  [Vg  (i)]'(e,  -  e,)  <  0,  where 
i  =  (J  -  1  )N  +  n  and  j  =  (/  -  1  )N  +  /„.  The  line  search  for  this  problem 
requires  that  we  solve  the  problem 

min  g(x  +  (e,  -  e,)A).  (8) 

OsAsl 


But 


dg  (x  +  (e;  -  e,  )A) 
d\ 


=  (e,  -  e,)'Vg  (x  +  e,  -  e,) 

=  (e,  -  e,)'(C  +  C')(x  +  e,  -  e;) 

=  [?g  (*)]'(e,  -  tj)  +  (e,  -  e;)'(C  +  C')(e,  -  e;). 


•  RELAY  NODE 

Figure  1.  43-node  communication  network  with  designated  key  links. 
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Since  Xp,  priced  favorably,  then  [Vg  (x)]'(e,  -  e;)  <  0.  Also,  (e,  -  e;)'(C  + 
C')(e,  -  e()  =  e,'(C  +  C')e,  +  e/(C  +  C’)e,  -  e,’(C  +  C')t,  -  e,'(C  + 
C')e,.  But  the  diagonal  elements  of  (C  +  C')  are  0  and  all  other  elements  are 
non-negative.  Hence,  the  solution  to  (8)  is  A*  =  1  and  the  exact  change  to  the 
objective  function  will  be  Vg  (x')(e,  -  e;)  -  e,'(C  +  C')e,  -  e/(C  +  C')e,,  a 
strict  decrease.  Therefore,  in  the  new  solution  Xp,  is  set  to  1  and  x,^  is  set  to  0. 
Since  this  holds  for  every  iteration  of  the  convex  simplex  algorithm,  integrality 
is  maintained  and  a  local  optimum  for  (4)— (6)  can  be  obtained  by  finding  a  local 
optimum  for  (4),  (5),  (7). 

Let  x  be  any  initial  assignment  for  the  frequency  assignment  problem.  Using 
this  initial  assignment,  the  algorithm  may  be  stated  as  follows: 

For  /  =  1,  .  .  .  ,F. 

For  n  =  1, .  .  .  ,N. 

ln:=k,  where  1^=1. 
i  :  =  (/  -  1  )N  +  n. 

/:=  (/-  !)*  +  /„• 

P  :=  (Vg(x)]'(e,  -  e;). 

If  p  <  0 
then 

x<s  :=  0 
Xfn  :=  1 

Repeat  as  long  as  p  <  0  for  some  /  and  some  n. 

4.  COMPUTATIONAL  EXPERIENCE 

We  implemented  the  frequency  assignment  algorithm  in  a  fortran  code.  All 
data,  including  the  matrices  Q,.  R,.  and  Sf,  are  stored  in  high  speed  core.  Special 
subroutines  were  wntten  to  evaluate  both  g(  )  and  Vg  (•)  at  a  point.  The  code 
begins  with  ^different  starting  solutions  and  stops  when  a  local  optimum  is  found. 

The  initial  assignment  for  run  r£  (1 . F)  is  to  assign  frequenc\  {[(n  +  r  - 

2)  modulo  F]  +  1}  to  node  n.  The  best  solution  obtained  from  all  F  runs  is  the 
output. 


Table  1.  Computational  Results  With  43  Node  Model 


Problem 


Row  description 

1 

2 

3 

4 

5 

a  dB 

10 

10 

12 

3  dB 

25 

25 

25 

25 

F  (channels) 

10 

12 

14 

14 

14 

Binary  variables 

516 

Iterations 

525 

341 

497 

526 

Solution  time  (secs) 

5 

7 

11 

11 

Initial  objective  value 

3153 

1561 

1875 

1671 

1670 

Final  objective  value 

164 

103 

79 

5 

4 

Jammed  kev  links 

8 

4 

3 
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Five  test  problems  were  generated  from  the  real-world  43-node  network  illus¬ 
trated  in  Fig.  1.  The  lines  connecting  nodes  are  the  designated  key  links.  The 
problems  all  have  the  same  topology  but  differ  in  the  selection  of  the  critical 
values  and  the  weighting  constants.  A  random  assignment  was  generated  and  the 
matrices  were  modified  so  that  this  assignment  produced  a  cost  of  zero.  Hence, 
the  optimal  objective  value  for  each  problem  is  zero. 

Our  computational  experience  is  reported  in  Table  1.  All  runs  were  made  on 
a  CDC  Cyber-875  using  the  FTN5  compiler  with  OPT  =  2.  The  initial  objective 
value  row  is  the  average  objective  value  for  the  F  initial  solutions.  Note  that  all 
five  problems  were  run  in  less  than  1  minute  of  CPU  time  and  the  final  objective 
values  were  quite  close  to  the  optimum  as  compared  to  the  initial  assignments. 

5.  CONCLUSIONS 

Our  optimization  model  and  computer  software  provide  a  practical  approach 
to  assist  communication  network  designers  in  obtaining  near-optimal  solutions 
for  the  frequency  assignment  problem.  The  fact  that  the  diagonal  elements  of  C 
in  the  quadratic  objective  function  x'Cx  are  zero  allows  a  very  efficient  imple¬ 
mentation  of  the  convex  simplex  method  which  maintains  integrality.  Hence,  if 
we  begin  with  an  integer  assignment,  the  convex  simplex  algorithm  follows  a 
sequence  of  integer  points  until  a  local  minimum  is  obtained.  This  procedure  is 
so  fast  that  very  large  problems  can  be  easily  handled  by  this  approach. 
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This  paper  generalizes  a  practical  convergence  result  first  presented  by  Polyak  This  new  result 
presents  a  theoretical  justification  Tor  the  step  size  which  has  been  successfully  used  in  several 
specialized  algorithms  which  incorporate  the  subgradient  optimization  approach 

Key  words  Subgradient  optimization,  nonlinear  programming,  convergence 


I.  The  subgradient  algorithm 

Let  G*0  be  a  closed  and  convex  subset  of  R For  each  ye/?",  define  the 
projection  of  y  on  G,  denoted  by  P(y>,  to  be  the  unique  point  of  G  such  that  for 
all  :eC,  |i  P(y  I  -  y  i  s  |  r  -  y|!.  It  is  well  known  that  the  projection  exists  in  this  case 
and  that  for  all  x,  ye  R ",  |!P(x)  -  P(y)|  «  |!x  -y|! 

Let  g  be  a  finite  convex  functional  on  G  For  each  ye  G,  define  the  subdtfferential 
of  g  at  i  by 

<igt  v  I  =  I V  V  e  R"  *nd  for  all  :e  G,  g<  r )  »  g(  y  I  +  tj  ( :  -  y )} 

Any  tj  e  ngi  y  l  is  called  a  subgradient  of  g  at  y  it  is  well  known  that  if  y  is  a  point 
at  which  g  is  differentiable,  then  ogi  v  *  =  {Vgi »  i),  a  singleton  set 

It  is  also  well  known  that  on  the  relative  interior  of  G,  g  is  continuous  and  the 
subdifferential  of  g  always  exists  That  this  mas  not  be  the  case  on  the  relative 
boundary  is  shown  in  the  following  simple  example 

Example  !.  Let  G  =  [0,  1]  in  R  The  finite  convex  function  f  G  -»  R  given  by 

fO,  0*  »  <  I. 

fails  to  have  a  subgradient  and  is  discontinuous  at  the  boundary  point  y  =  1 
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That  the  notions  of  continuity  and  subdifierentiability  are  independent  on  the 
relative  boundary  of  G  is  shown  by  the  following  examples. 

Example  2  [21, p. 229],  Let  G»=[0,1]  in  R.  The  finite  convex  function  f  :G-*R 
given  by  /(y)*  -(1  -y)l/2  is  continuous  on  G  but  fails  to  have  a  subgradient  at 
the  boundary  point  y  - 1. 


Example  3  [6,  p.  96],  Let  G  =  {(y,,  y2):  0«y2«y,  <  1}  in  R3.  The  finite  (but 
unbounded)  convex  functional  f  :G-*R  given  by 


/(>)  = 


(>2)2/y„ 

0, 


>1  =  >2  =  0, 


is  discontinuous  at  (0, 0)  but  has  a  subgradient  everywhere  on  G. 


Consider  the  nonlinear  programming  problem  given  by 

minimize  g(y)  (NLP/SD) 

subject  to  y  e  G, 

where  we  assume  that  for  all  ye  G,  dg(y)*0  and  that  the  set  of  optimal  points 
r  ¥■  0.  We  denote  the  optimal  objective  value  by  y. 

The  subgradient  optimization  algorithm  for  the  solution  of  NLP/SD  was  first 
introduced  by  Shor  [23]  and  may  be  viewed  as  a  generalization  of  the  steepest 
descent  method  in  which  any  subgradient  is  substituted  for  the  gradient  at  a  point 
where  the  gradient  does  not  exist.  This  algorithm  uses  a  sequence  of  positive  step 
sizes  {j,},  which  in  turn  depend  on  a  predetermined  sequence  of  fixed  constants 
{A,}  and  (in  some  cases)  certain  other  quantities. 


Subgradient  optimization  algorithm 

Step  0  (Initialization) 

Let  y0  €  G  and  set  i «-  0. 

Step  1  (Find  Subgradient  and  Step  Size) 

Obtain  some  rj.edg  (y,). 

If  tj,  =  0,  terminate  with  y,  optimal;  otherwise,  select  a  step  size  s,. 
Step  2  (Move  to  New  Point) 

Set  yl+, P(y,-s,rf,),  i«-  /+ 1,  and  return  to  step  1. 


Unfortunately,  the  termination  criterion  in  step  1  may  not  hold  at  any  member 
of  r  and  is  thus  computationally  ineSective.  Hence,  some  other  stopping  rule  must 
be  devised.  In  practice  this  is  often  a  limit  on  the  number  of  iterations.  The  functional 
values  produced  by  the  algorithm  will  be  denoted  by  g,  =  g(y.) 
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Various  proposals  have  been  offered  for  the  selection  of  the  step  sizes.  Four 
general  schema  which  have  been  suggested  are: 


=  A,. 

0) 

=  A,/||j?,||, 

(2) 

«At/|n.r. 

(3) 

=  A,(g,-p)/|h,||J, 

(4) 

where  p,  the  target  value,  is  an  estimate  of  y  and  al)  A,  >  0. 

The  papers  of  Polyak  [19]  and  Held,  Wolfe  and  Crowder  [12]  have  provided  the 
major  impetus  for  widespread  practical  application  of  the  algorithm.  Schema  (4) 
has  proven  to  be  a  particularly  popular  choice  among  experimenters.  Theorem  4  of 
Polyak  [19]  is  the  most  often  quoted  convergence  result  justifying  use  of  this  schema. 
For  many  mathematical  programming  models,  the  target  value  is  a  lower  bound  on 
the  optimum  (see,  e.g.,  [2,  3,  4,  5,  13,  14,  22]).  For  this  case  Polyak’s  Theorem  4, 
using  schema  (4),  requires  that  A,  =  1  for  all  i.  For  all  the  above  studies,  a  decreasing 
sequence  of  A’s  was  found  to  work  better  than  A*  =  1  for  all  L  Hence,  the  existing 
theory  did  not  justify  what  we  had  found  to  work  well  in  practice.  The  objective  of 
this  paper  is  to  present  improved  theoretical  results  which  help  to  explain  what  has 
been  found  to  work  well  in  practice.  Specifically,  we  loosen  slightly  the  restrictions 
imposed  on  the  sequence  {A,},  and  obtain  a  more  general  result  when  the  target 
value  is  less  than  the  optimum. 

The  literature  on  the  subgradient  algorithm  is  extensive,  much  of  it  in  Russian. 
Good  coverage  is  contained  in  the  bibliographies  of  [20],  [24],  and  [25].  Much  of 
this  literature  has  grown  up  in  conjunction  with  the  relaxation  method  for  solving 
linear  inequalities  (see,  e.g.,  [1,7, 10, 11, 15]). 

2.  Polyak’s  convergence  results 

The  results  of  Theorem  4  of  Polyak  [19]  use  the  following  general  restrictions  on 
the  sequence  {A,}  used  with  schema  (4): 

O<asA,*s0<2,  (5) 

where  a  and  f)  are  fixed  constants. 

The  results  contained  in  this  theorem  include,  under  (4),  (5),  and  (essentially) 
the  assumption  that  there  is  some  k  >0  such  that  |)t;,||<k: 

(A)  if  p  >  y,  either 

(a)  there  is  some  n  such  that  g„  <  p,  or 

(b)  all  gn*p  and  lim  g„  =  p; 
and 

(B)  if  p  <  y  and  all  A„  =  1,  given  S  >0, 
there  is  some  n  such  that  g„«y  +  (y-p)  +  5. 
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If  (b)  occurs  in  part  (A),  the  convergence  is  geometric.  Polyak's  theorem  contains  1 

additional  results  for  C  =  R"  (so  that  the  projection  is  superfluous),  with  ail  A„  =  1  I 

and  p  >  y,  in  which  case  geometric  or  faster  convergence  to  the  target  value  is  1 

obtained.  I 

In  the  next  section  we  will  relax  condition  (5)  to  the  following:  d| 

O<A,<0<2  and  £a,  =  oo,  (6) 

where  0  is  a  fixed  constant.  With  this  relaxation  (which  allows  the  sequence  {A,} 
to  approach  zero)  we  present  results  analogous  to  (A)  and  an  interesting  generaliz¬ 
ation  of  (B). 

Stronger  convergence  results  are  available  for  special  cases,  e.g.  where  set  G 
contains  a  set 

H  =  {x:f(x)*z 0,  i  =  l,...,m}, 

with  each/1  convex  and  H  having  a  nonempty  interior  (see,  e.g.,  [8, 16, 17, 18]  and 
[25,  Ch.  2]). 

3.  New  convergence  results 

The  main  results  in  this  section  appear  in  Propositions  5, 7, 9,  and  10.  Propositions 
5  and  7  correspond  to  part  (A)  of  Polyak’s  Theorem  4  with  slightly  r/eaker  conditions 
on  the  sequence  (A,}  and  Proposition  9  is  a  generalization  of  part  B  of  Theorem  4. 

Proposition  10  is  a  new  result  apparently  obtainable  only  when  the  conditions  on 
{A,}  are  weakened  so  that  we  may  require  A,-»0. 

Proposition  1.  IfyeG,  then 

Ib-y.+i  II2  56  lly-y.ll:+  s?llT)1|l2+2s,(g(y)-g,). 

Proof.  Let  y  e  C. 

lly-y.fif=lly-/>(y.-M.)llJ=  II  P(y )- P(y, -$*?.)  II2 « lly-y.  +  v?.l!i 
=  lly  ~  y.  II2  +  *?ll  n.  II2 + 2r.T7,  •  (y  -y.) 

« lly  ~  y.  I!2 + *?lk  II2 + 2*.  (g(y )  -  g, ). 

Proposition  2.  If  ye  T,  then  under  (4), 

liy-y.f.ll2^  lly-y.ll2+Aj(g,-p)[A((g1-p)-2(g,-y)]/||T7,||J. 

Proof.  Let  yeT.  Substituting  in  Proposition  1  for  s,  from  (4)  and  using  g(y)  =  y, 
we  obtain 

lly-yifill2*slly-y.ll2+A?(g,-p)2/||»j1||J+2A1(g,-p)(y-*,)/||T?.II2 
=  l|y-y.ll2+A1(g,-p)[Al(g1-p)-2(gl  -y)]/||r?,ll2. 


1  •'"k  -  v.  ^  t  "  « 
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Proposition  3.  If  y  e  T,  p  >  y,  and  g,  *  p,  then  under  (4), 

!b  -  I!2  ^  lb  -  II 2 + Af( -  2)(gf  -  p)2/ 11 II2. 

Proof.  Let  y  €  T,  p y,  and  g,  *  p.  Now 

A,(g,  -p)  -2(g,  -  y) «  A((g,  -p)  -2(g,  -p)  =  (A,  -2)(g,  -p). 

Thus, 

A<(g,-p)[A<(g1-p)-2(gI-r)J/||,,||JcA1(A<-2)(g,-p)Vlh,l|2. 

The  result  now  follows  from  Proposition  2. 

Proposition  4.  Ify  e  T,  p  s*  y,  and  all  g,  >  p  unde  (4)  with  all  A,  <  2,  then  the  sequence 
{||y  —  ||2}  is  monotone  decreasing  and  converges  to  some  ijt&O. 

Proof.  Let  y  €  T,  p  3s  %  all  A,  <  2,  and  all  g,  3*  p.  Since  each  A,  <  2,  then  also  each 
Aj(A,  —  2)(g,  —  p)2/ 1|  17, ||  <0,  and  from  Proposition  3,  {||>’->',||2}  is  a  monotone 
decreasing  sequence.  This  sequence  is  bounded  below  by  zero  and  thus  converges 
to  some  nonnegative  value,  say  t//. 

Proposition  5.  If  p  3  y  and  there  is  some  *  >0  such  that  all  ||  17, ||  <  k,  then  under  (4) 
and  (6),  given  S  >  0,  there  is  some  M  such  that  gM  «  p  +  8. 

Proof.  Let  6>0  be  given,  with  p2=  y,  and  all  (( (f < Suppose,  contrary  to  the 
desired  result,  that  all  g,  >  p  +  8.  Take  any  y  e  f.  Then  from  Proposition  3, 

A,(2  —  A,)(g,  —  p)2/ 1|  *7.  ||2  lb- y,||2  — Hy— yuill2. 

Since  A, «  p  <  2,  ||  17, ||  <  *,  and  g,  -  p  >  6, 

A,(2~ (3)82/ k2«  lb~y.ll2-  ll.v- .P.-hII2.  (7) 

Adding  together  the  inequalities  obtained  from  (7)  by  letting  i  take  on  all  values 
from  0  to  n,  we  obtain 

(A0+  •  •  •  +  A„)(2 -  fi)82/ k5«  lb  — J'oll2- lb’- >’ii*.I|2.  (8) 

As  n  goes  to  °c,  the  left  side  of  (8)  goes  to  °c,  whereas,  by  Proposition  4,  the  right 
side  of  (8)  goes  to  ||y->’0||2-<A2,  a  contradiaion. 

Proposition  5  gives  a  practical  convergence  result  when  the  target  exceeds  the 
optimal  value.  At  worst  we  eventually  obtain  an  objective  value  arbitrarily  close  to 
the  target  value.  At  best  we  may  obtain  an  objective  value  as  good  as  or  better  than 
the  target  value,  in  which  case  it  may  be  desirable  to  restart  the  algorithm  after 
supplying  a  new  target  value. 

It  does  appear  theoretically  possible  that  no  iterate  may  have  an  objective  value 
as  good  as  or  better  than  the  target  value.  In  this  case,  we  obtain  convergence  results 
in  Proposition  7  analogous  to  Polyak's  Theorem  4,  part  (A),  alternative  (b). 
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Proposition  6.  If  p*y  and  all  g,>  p  under  (4)  and  (6),  then  the  sequence  {>,}  is 
bounded. 

Proof.  Let  ye  T,  p*y,  and  all  g,  >  p.  Now, 

11*11  =  II* -y+HI « II*  -.HI + IIHI- 

From  Proposition  4,  wc  then  have  that 

11*11  el!*>-.HI  +  IIHI- 

Proposition  7.  If  p  5*  y,  there  is  some  k  >  0  such  that  all  ||i),||  <  k,  and  all  g,  >  p  under 
(4)  and  (6),  then  {>>,}  converges  to  some  zeG  and  some  subsequence  {gN0)}  converges 
to  p  2s  g(z).  If  g  is  also  continuous  on  G,  then  {g,}  converges  to  p  =  g(z). 

Proof.  Let  p^y,  all  g,  >p,  and  all  ||j?,||<>c.  Using  Proposition  5,  we  obtain  a 
convergent  subsequence  of  {g,}.  There  is  some  M( 0)  such  that  gMioi55  P  + 1-  Having 
determined  M(j),  define  h;  =  min{g0, . . . ,  gMU)}-p>0  and  5,  =  min{(j)J'tI,  hJ2). 
Applying  Proposition  5,  there  is  some  +  such  that  gM(>»i)^P  +  6,  and 
furthermore,  M{j+ 1)  >  M(j).  As  constructed,  {gMa)}  converges  to  p.  By  Proposition 
6,  {>mui}  is  bounded,  so  a  subsequence  of  {yM (>)},  say  {.vNO,},  converges  to  some 
point  z.  Obviously,  {gNu)}  also  converges  to  p.  Since  G  is  closed,  zeG.  Let  17  be 
any  subgradient  of  g  at  z.  Thus  for  each  j,  gN(j)^  gO)  +  ifOn/tp- z),  from  which 
it  follows  that  p  ^  g(z).  Now  consider  the  ancillary  functional  g”  \G-*  R,  given  by 
g"(y)  =  max{g(>  ),  o}.  Note  that  ga  is  convex  and  finite  on  G  and  2  is  a  minimum 
point  of  g"  over  G.  Also,  since  ga(y)&  g(y)  and  each  g,>p,  each  subgradient  17, 
of  g  at  y,  is  also  a  subgradient  of  g"  at  y,.  Thus  under  (4)  and  (6),  for  each  i,  g,  =  gf- 
By  Proposition  4,  applied  to  g ",  {>•,}  converges  to  2.  When  g  is  continuous  on  G, 
lim  g,  =  g(z)  =  lim  gN(>)  =  p. 

If  we  simply  require  that  subgradients  exist  for  all  points  generated  by  the 
subgradient  optimization  algorithms,  and  relax  the  requirement  that  subgradients 
exist  at  all  points  on  the  relative  boundary  of  G,  then,  contrary  to  the  result  in 
Proposition  7,  when  all  g,  >  p,  {.v,}  can  converge  to  a  point  z  with  g(z)>  p,  as  shown 
in  the  following  example. 

Example  4.  Let  G  =  {(>-,, *):  1}  in  R\  The  finite  convex  functional 

f  :G-*R  given  by 


0i,*)*0,  1). 

O’,,*)  =  0,1), 


fails  to  have  a  subgradient  at  the  boundary  point  z  =  (l,  1).  Also,  y-0  with  T- 
{(>•,,  y2):  y,  =  l,0«sy2<l}.  Letting  A,  =  1,  for  all  i,  (6)  holds.  Then  under  (4),  starting 
from  y0  =  (0, 1 )  with  p  =  0  =  y,  the  subgradient  optimization  algorithm  generates  the 
points  *  *  (1  —  (4)',  1)  with  g,  =  (*)'.  Now  limy,  =  2,  but  lim  g,  =  0=  p  <  gO)  =  1. 
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Proposition  8.  If  y  e  T,  g,  >  p,  and  A, «  0  2,  then  under  (4), 

!>'->’i+il|J<ll>,->'.||5  +  A1(g,-p)(2-^)t(r-fl)  +  (/3/(2-^))(y-p)]/||T?1||I. 

Proof.  Let  y  e  T,  g,  * p,  and  A,  <0  #  2.  Now, 

A,(g,  -p) -2(g,  -  y) «  0(g,  -p) -2(g,  -  y) 

=  0(g, -p)-(2-0)(g,  -y)-0(g,-y) 

■=0(y-p)  +  (2-0)(y-g,) 

=  (2-0)[(y-g,)  +  (0/(2-0))(y-p)]. 

Thus, 

A,(g,-p)[A,(g,-p)-2(gl  -y)]/||r?,||J 

<A1(g,-p)(2-0)[(y-g,)  +  (0/(2-0))(y-p)]/|k||^ 

The  result  now  follows  from  Proposition  2. 

Proposition  9.  If  p  <  y  and  there  is  some  k  >  0  such  that  all  ||  tj,||  <  «,  then  under  (4) 
and  (6),  given  S  >  0,  there  is  some-M  such  that  gM  «  y  +  (0/(2  - 0 ))( y -  p)  +  6. 

Proof.  Let  5>0  be  given,  with  p <  y,  and  all  ||7),||<k.  Suppose,  contrary  to  the 
desired  result,  that  all  g,>  y  +  (0/(2-0))(y-p)+5,  or  (y-g,)  +  (0/(2-0))x 
(y-p)<  -5.  Since  0  <  2  and  g,  >p,  then 

Aj(g,-p)(2-0)[(y-g,)  +  (0/(2-0))(y-p)]/|h,|!J 

<  -«A.(g,  - p)(2  - 0 )/ 1| T7. ||2.  (9) 

Take  any  ye  T.  Then  by  (9)  and  Proposition  8,  we  have  that 

5A ,  ( g,  -  p ) ( 2  -  0 )/ 1|  17,  ||  •’  <  ||y  -  y,  I! :  -  ||y  - 

Since  ||  tj,  ||  <  n  and  g,  s*  y  >  p,  then  also 

A,5(y-p)(2-0)/KJ<||y-y,||:-||y-y,.,||J.  (101 

Adding  together  the  inequalities  obtained  from  (10)  by  letting  i  take  on  all  values 
from  0  to  n,  we  obtain 

(Ao+---  +  Aj6(y-p)(2-0)/KJ<||y-yof-||y-yB.,||:.  (11) 

As  n  goes  to  oc,  the  left  side  of  ( 1 1 )  goes  to  oc,  whereas,  by  Proposition  4,  the  right 
side  of  (11)  goes  to  ||y  -y0||2  -  il>2,  a  contradiction. 

The  above  is  our  generalization  of  Polyak's  Theorem  4  Part  (B).  At  worst  we 
eventually  obtain  an  objective  value  whose  eiror  is  arbitrarily  close  to  0/(2 -0) 
times  the  error  present  in  the  target  value  estimate  of  y 
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Proposition  10.  If  p<y,  there  is  some  k>0  such  that  all  || r), ||  <  k,  and  all  g,  >  y 
under  (4)  and  (6)  with  A,-»0,  then  there  is  a  subsequence  {gM(JI}  which  converges  to 
r 

Proof.  Let  p  <  y,  all  ||  y,  ||  <  *,  A,  -» 0,  and  alt  g,  >  y.  Using  Proposition  9,  we  obtain 
a  convergent  subsequence  of  {g,}.  Define  0o  =  min{l,  l/(y-p)}  and  a0=  1.  There 
is  some  N( 0)  such  that  for  all  i>N( 0),  A,  <  f)0.  Then  also  (0o/(2 -p0))(y-p)sz  1. 
Applying  Proposition  9  to  {g,-N(0i},  there  is  some  M(0)?N(0)  such  that  gM( 

V  +  Oo)/(2-^0)(y~p)  +  5o€  >  +  2.  Having  determined  M(j),  define  J>,  = 
min{go,...,gM(y)}-y>0,  ^^minUj)'*1,  h,/3},  and  =  min{l,  B^x/(y  —  p)}. 
There  is  some  N(j+  1)  such  that  for  all  i»N(j  +  1),  y,  </3J+1.  Then  also  (0,*,/ 
(2 -pnl))(y-p)<81+l.  Applying  Proposition  9  to  {g,-NU+1)},  there  is  some 
M(j+\)*z  N(j  + 1 )  such  that 

V+(Pj+t/(2-0j*t))(y-p)+6j*t*  y  +  25,„,. 

Then  also 

+  and  gM(, ,.,>*£  y  +  (2/3)hJ<min{g0 . gwij)}, 

so  that  M(j+  1)>  M(j).  As  constructed,  converges  to  y. 


4.  Conclusions 

Propositions  5  and  7  give  the  convergence  results  obtained  under  (4)  and  (6)  for 
a  target  value  at  or  above  the  optimal  value.  It  is  readily  apparent  that  Proposition 
5  is  compatible  with  Polyak's  result  (A).  Proposition  9  gives  the  corresponding  result 
for  a  target  value  under  the  optimal  value.  We  have  found  this  to  be  a  more  practical 
result  (see,  e..g.,  [2,  3,  4,  5,  13,  14,  22]).  Taking  /3  =  1,  we  have  Polyak's  result  (B) 
as  a  special  case  of  Proposition  9.  Proposition  9  shows  more  clearly  the  dependence 
of  the  demonstrably  attainable  error  on  the  upper  bound  f)  for  {A,}.  Requiring  A,  -*0 
allows  us  to  produce  a  subsequence  of  objective  values  converging  to  the  optimal 
objective  as  shown  in  Proposition  10.  This  paper  has  not  addressed  the  question  of 
any  convergence  rate  associated  with  the  use  of  (4)  and  (6).  Goffin  [9]  has  provided 
such  results  when  schema  (2)  is  used. 
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Abstract.  This  paper  presents  a  new  algorithm  for  the  solution  of  a  network  problem  with 
equal  flow  side  constraints.  The  solution  technique  is  motivated  by  the  desire  to  exploit  the  special 
structure  of  the  side  constraints  and  to  maintain  as  much  of  the  characteristics  of  pure  network 
problems  as  possible.  Tire  proposed  algorithm  makes  use  of  lagrangean  relaxation  to  obtain  a 
lower  bound  and  decomposition  by  right-hand-side  allocation  to  obtain  upper  bounds.  The 
Lagrangean  dual  serves  not  only  to  provide  a  lower  bound  used  to  assist  in  termination  criteria  for 
the  upper  bound,  but  also  allows  an  initial  allocation  of  equal  flows  for  the  upper  bound.  The  al¬ 
gorithm  has  been  tested  on  problems  with  up  to  1 500  nodes  and  6000  arcs.  Computational  expe¬ 
rience  indicates  that  solutions  whose  objective  function  value  is  well  within  1  %  of  the  optimum 
can  be  obtained  in  1  %-65%  of  the  MPSX  time  depending  on  the  amount  of  imbalance  inherent 
in  the  problem.  Incumbent  integer  solutions  which  are  within  99.99%  feasible  and  well  within  1% 
of  the  proven  lower  bound  are  obtained  in  a  straightforward  manner  requiring,  on  the  average,  30% 
of  the  MPSX  time  required  to  obtain  a  linear  optimum. 

Keywords.  Networks,  Algorithms,  Computational  Mathematical  Programming,  Decompos¬ 
ition,  Lagrangean  Relaxation 

Acknowledgement.  This  research  was  supported  in  part  by  the  Air  Force  Office  of  Scientific 
Research  under  Contract  Number  AFOSR  83-0278,  the  Department  of  Defense  under  Contract 
Number  MDA  903-86-C-0182,  and  Rome  Air  Development  Center  under  Contract  Number 
SCEEE  PDP/86-75. 


1  Introduction 


This  paper  makes  use  of  relaxation  in  conjunction  with  decompostion  Tor  the  sol¬ 
ution  of  the  equal  flow  problem.  The  problem  is  easily  conceptualized  as  a  minimal  cost 
network  flow  problem  with  additional  constraints  on  certain  pairs  of  arcs.  Specifically, 
given  pairs  of  arcs  are  required  to  take  on  the  same  value.  The  problem  is  defined  on  a 
network  represented  by  an  m  x  n  node-arc  incidence  matrix,  A,  in  which  K  pairs  of  arcs 
are  identified  and  required  to  have  equal  flow.  Mathematically,  this  is  expressed  as: 

Minimize  cx 
s.t.  Ax  =  b 

xk  =  xk*K,  k  =  1,2,...,K 
0  ^  x  ^  u 
x,  integer 

where  c  is  a  1  x  n  vector  of  unit  costs,  b  is  an  m  x  1  vector  of  node  requirements,  0  is 
an  n  x  1  vector  of  zeroes,  x  is  an  n  x  1  vector  of  decision  variables,  and  u  is  an  n  x  1 
vector  of  upper  bounds.  This  mathematical  statement  of  the  problem,  henceforth  re¬ 
ferred  to  as  problem  11,  assumes  that  the  first  2K  arcs  appear  in  the  equal  flow  con¬ 
straints.  This  is  not  a  restrictive  assumption,  since  by  rearranging  the  order  of  the  arcs, 
any  equal  flow  problem  with  K  pairs  can  be  expressed  in  the  above  form.  Note  that  the 
K  pairs  of  arcs  are  mutually  exclusive,  i.  e.,  an  arc  appears  in  at  most  one  side  constraint. 
We  also  assume  without  loss  of  generality,  that  uk  =  uk*K  for  k  =  1,2,...,K. 

Applications  of  the  equal  flow  problem  include  crew  scheduling  [5],  estimating 
driver  costs  for  transit  operations  (14),  and  the  two  duty  period  scheduling  problem 
[11],  When  integrality  constraints  are  not  present,  the  model  is  referred  to  as  the  linear 
equal  flow  problem  (PI).  The  linear  model  is  applicable  to  problems  where  integrality 
is  not  restrictive.  For  example,  in  federal  matching  oT  funds  allocated  to  various  projects 
[4],  The  linear  equal  flow  problem  may  be  solved  using  a  specialization  of  the  simplex 
method  for  networks  with  side  constaints  (3j.  It  has  also  been  solved  by  transformation 
to  a  nonlinear  programming  problem  (4J. 

The  use  of  relaxation  techniques  and/or  decomposition  techniques  in  the  solution 
of  problems  with  special  structure  in  the  constraint  set  is  motivated  by  potential  com¬ 
putational  efficiencies.  Glover,  Glover  and  Martinson  |6]  address  a  generalized  network 
problem  in  which  arcs  in  specified  subsets  must  have  proportional  flow.  The  solution 
approach  is  via  solution  of  a  series  of  problem  relaxations  and  progressive  bound  ad- 
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justment.  The  underlying  principle  is  shared  in  the  ensuing  development  for  the  equal 
flow  problem. 

Lagrangean  relaxation  has  been  used  to  aid  in  the  solution  of  the  integer  equal  flow 
problem  in  two  specific  instances.  Shepardson  and  Marsten  [11]  reformulate  the  two 
duty  period  scheduling  problem  as  a  single  duty  period  scheduling  problem  with  equal 
flow  side  constraints  and  integrality  constraints  on  the  variables.  Turnquist  and 
Malandraki  [14]  model  the  problem  of  estimating  driver  costs  for  transit  operations  as 
an  integer  equal  flow  problem.  In  both  studies,  the  side  constraints  are  dualized  and  the 
Lagrangean  dual  solved  using  subgradient  optimization  to  yield  a  lower  bound  on  the 
optimal  objective  value.  In  [14]  step-size  determination  during  the  subgradient  opti¬ 
mization  process  is  aided  by  a  line  search. 

The  objective  of  this  investigation  is  to  develop  and  computationally  test  a  new 
algorithm,  based  on  relaxation  and  decomposition,  for  the  linear  equal  flow  problem  and 
its  use  in  solving  the  integer  equal  flow  problem.  The  linear  equal  flow  problem  is  a 
natural  relaxation  for  the  integer  problem  and  also  provides  an  approximation  to  the 
integer  model.  Because  the  problems  are  very  closely  allied,  primarily  due  to  the 
unimodularity  of  the  node-arc  incidence  matrix,  solutions  to  the  integer  model  can  be 
obtained  by  using  a  slight  modification  of  the  technique  for  the  linear  model.  By  em¬ 
ploying  relaxation  and  decomposition,  solution  of  the  equal  flow  problem  is  via  two  se¬ 
quences  of  pure  network  problems,  totally  eliminating  the  computational  overhead 
associated  with  maintaining  the  inverse  of  a  basis  matrix.  The  exploitation  of  the  special 
structure  of  the  side  constraints  and  the  network  structure  results  in  a  decrease  in  both 
computer  storage  and  computation  time  since  reoptimization  procedures  are  applicable 
for  solution  of  subproblems  of  the  two  sequences. 

The  solution  technique  consists  of  making  use  of  the  Lagrangean  dual  of  the  equal 
flow  problem  with  the  side  constraints  dualized  to  obtain  a  lower  bound.  The 
Lagrangean  relaxation  of  the  equal  flow  problem  does  not  enforce  the  equal  flow  con¬ 
straints.  The  Lagrangean  dual  for  the  linear  and  the  integer  equal  flow  problem  is  ex¬ 
actly  the  same,  since  the  constraint  set  for  the  Lagrangean  relaxation  is  identical.  This 
Lagrangean  dual  is  similar  to  the  quadratic  programming  problem  used  in  [4],  The 
similarity  lies  in  penalizing  the  violating  equal  flow  constraints. 

Upper  bounds  are  obtained  by  use  of  a  decomposition  of  the  equal  flow  problem 
based  on  parametric  changes  in  the  requirements  vector.  The  Lagrangean  dual  provides 
a  lower  bound  which  is  used  to  aid  the  the  solution  of  the  decomposition  model  in  de¬ 
termining  an  initial  nght-hand-side  allocation  as  well  as  providing  a  lower  bound  on  the 
objective  so  that  a  solution  is  known  to  be  within  a  percentage  of  the  optimal.  As  such, 
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the  algorithm  can  terminate  when  a  solution  with  a  prespecified  tolerance  on  the  objec¬ 
tive  function  value  is  obtained.  By  enforcing  that  the  parametric  changes  in  the  re¬ 
quirements  vector  be  such  that  integral  allocations  of  equal  flow  be  obtained,  upper 
bounds  on  the  integer  problem  can  be  obtained. 

The  solution  technique  makes  use  of  subgradient  optimization  in  the  solution  of 
both  the  Lagrangean  dual  for  obtaining  a  lower  bound  and  the  decomposition  model  for 
obtaining  the  upper  bound.  Both  the  lower  and  upper  bounding  algorithms  have  been 
developed  in  the  context  of  the  general  subgradient  algorithm  which  is  briefly  presented 
in  Section  2.  Section  3  introduces  the  Lagrangean  dual  for  the  equal  flow  problem  and 
the  lower  bounding  algorithm.  Section  4  presents  the  decomposition  of  the  linear  equal 
flow  problem  and  the  upper  bounding  algorithm.  The  overall  procedure  which  makes 
use  of  the  algorithms  of  Sections  3  and  4  is  given  in  Section  5,  computational  results  are 
given  in  Section  6  and  conclusions  drawn  in  Section  7. 

2  The  Subgradient  Algorithm 


The  subgradient  algorithm  was  first  introduced  by  Shor  [  1 3j  and  provides  a  frame¬ 
work  for  solving  nonlinear  programming  problems.  It  may  be  viewed  as  a  generalization 
of  the  steepest  descent  (ascent)  method  for  convex  (concave)  problems  in  which  the 
gradient  may  not  exist  at  all  points.  At  points  at  which  the  gradient  docs  not  exist,  the 
direction  of  movement  is  given  by  a  subgradient.  Subgradients  do  not  necessarily  pro¬ 
vide  improving  directions  and  consequently,  the  convergence  results  ofZangwill  [15]  do 
not  apply.  Convergence  of  the  subgradient  algorithm  is  assured,  however,  under  fairly 
minor  conditions  on  the  step  size. 

Given  the  nonlinear  program  PO, 

Minimize  f(y) 
s.t.  y  €  G 


where  f  is  a  real-valued  function  that  is  convex  over  the  compact,  convex,  and  nonempty 
set  G,  a  vector  q  is  a  subgradient  of  f  at  y'  if  ffy)  -  ffv')  q(y  -  y')  for  all  y  e  G.  For 
any  given  y'  e  G,  the  set  of  all  subgradients  of  T  at  y'  is  denoted  by  effy').  Moving  a 
sufficiently  large  distance  s  along  -q  can  yield  a  point  x  =  y'  -  sq  such  that  x  €  G.  The 
projection  of  the  point  x  onto  G,  denoted  by  P[x],  is  defined  to  be  the  unique  point  y  e 


G  that  is  nearest  to  x  with  respect  to  the  Euclidean  norm.  Using  the  projection  opera¬ 
tion,  the  subgradient  algorithm  in  its  most  general  form  follows: 

ALGORITHM  1:  SUBGRADIENT  OPTMIZATION  ALGORITHM 
0  Initialization 
Let  y®  e  G, 

Select  a  set  of  step  sizes  s«,  s„  s„... 
i  -  0. 

1  Find  Subgradient 
Let  T|,  £  <?f(y‘). 

If  t\,  =  0,  then  terminate  with  y*  optimal. 

2  Move  to  new  point 

y'*1  Ply*  -  S|T)i] 

i  «-  i  +  1,  and  return  to  step  1. 

There  are  three  general  schema  which  can  be  used  in  determining  the  step  size  when 
the  subgradient  algorithm  is  implemented  for  a  specific  problem: 

i.  s,  =  X, 

ii.  s,  =  X.,/1171,112 

iii.  s,  =  X,(f(y')  -  F)/\W 

where  F  is  an  estimate  of  f*.  the  optimal  value  of  f  over  G.  A  summary  of  the  known 

convergence  results  for  this  algorithm  may  be  found  in  [2j  and  (10). 

3  The  Lower  Bound 

A  lower  bound  on  the  objective  function  oT the  equal  flow  problem,  II  or  PI,  can 
be  obtained  by  using  the  Lagrangean  dual  of  the  problem.  The  lower  bound  is  used  in 
the  step  size  determination,  termination  criteria,  and  determination  of  an  initial  equal 
flow  allocation  for  the  upper  bound  procedure.  Associating  the  Lagrange  multiplier  wk 
with  the  kth  equal  flow  constraint  and  defining  the  K-vector  w  =  (w„  w;,  .  .  .,  wK),  the 
Lagrangean  dual  for  PI,  referred  to  as  problem  DI,  may  be  stated  as 

maximize  h(w) 

W  €  RK 
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where  h(w)  =  min{cx  +  £,  wv(xk-xK4|l)  !  Ax  =  b,  0  £  x  ^  u}.  Since  PI  is  a  linear 
program,  it  is  easily  established  that  the  optimal  objective  values  of  PI  and  Dl  are  equal 
and  that  any  feasible  solution  to  Dl  provides  a  lower  bound  on  the  optimal  objective 
value  for  problems  P!  and  11.  For  any  given  value  of  the  vector  w,  the  Lagrangean  re¬ 
laxation  is  a  pure  network  problem.  The  subgradient  of  h  at  a  point  w  is  given  by  the 
K -vector 

d  =  (  x,  -  xK*,, .  .  .,xK  -  x1K) 

where  x  solves  the  Lagrangean  relaxation  at  w,  given  by 

(min  cx  +  I,  wt(xk-xK,II)  |  Ax  =  b,0  S  x  ^  u). 

ALGORITHM  1  assumed  the  function  fl[y)  to  be  convex,  whereas  h(w)  is  piece- 
wise  linear  concave.  The  lower  bounding  algorithm,  ALGORITHM  2,  modifies  the 
framework  of  the  previous  algorithm  for  a  concave  function.  The  step  sizes  used  are 
given  by  A,0  =  p,  and  successive  values  of  X,  depend  on  the  progressive  improvement  of 
the  objective  and  a  parameter  m*.  As  long  as  the  objective  function  continues  to  im¬ 
prove  across  m*  iterations,  the  same  value  of  the  multiplier  is  retained.  If  the  objective 
does  not  improve  over  m*  iterations,  the  multiplier  is  halved,  and  successive  iterations 
continue  from  the  point  where  the  incumbent  best  objective  function  value  is  found  for 
the  previous  value  of  the  multiplier.  The  algorithm  makes  use  of  a  scalar,  UBND,  re¬ 
presenting  an  upper  bound  for  the  problem.  Since  the  solution  procedure  progressively 
improves  both  the  lower  bound  and  the  upper  bound  for  the  equal  flow  problem,  each 
time  the  lower  bound  algorithm  is  invoked  the  value  for  UBND  is  obtained  from  the 
upper  bound  procedure.  For  this  algorithm,  we  assume  that  both  bounds  are  greater 
than  zero. 

Several  termination  criteria  are  pertinent  to  the  lower  bound  algorithm.  If  the 
value  of  the  multiplier  becomes  negligibly  small,  further  improvement  in  the  lower  bound 
is  negligible.  Such  termination  criteria  are  relevant  particularly  to  the  initial  invocation 
of  the  lower  bound  algorithm  since  no  valid  estimate  of  the  upper  bound  is  available. 
Further,  the  maximum  number  of  iterations  allowed  in  the  initial  invocation  of  the  lower 
bound  procedure  should  be  chosen  to  be  larger  than  in  subsequent  invocations. 
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ALGORITHM  2:  LOWER  BOUND  ALGORITHM 

1  Initialization 

Initialize  UBND,  step  size  p,  m*,  and  tolerance  c. 
w  •*-  0,  m'  «-  0,  d'  «-  oo,  I  «-  0. 

2  Find  Subgradient 
I  «-  I  +  1. 

Let  x  solve  h(w)  =  min{cx  +  Ek  wk(xk-xK+k)|  Ax  =  b,  0  ^  x  ^u}. 
d  «—  (  X|  -  xK  +  „  . .  .,xK  -  x1K). 

Iflldll  <  lid'll,  d'  —  d,  X'  — X 

If  d  =  0,  terminate. 

Ifh(w)  <  LBND, 

m'  ■>-  m'  +  1, 

if  m'  =  m*  p  «-  p/2,  w  *-  w*,  d  +-  d*,  m'  «-  0; 
otherwise, 
m'  <-  0, 

LBND  -  h(w). 
w*  «-  w 
d*  «-  d 

If  (UBND -LBND)  ^  e(UBND),  terminate. 

3  Move  to  new  point 

(a)  w  «-  w  +  pd. 

(b)  If  max(pd,}  <  .005,  terminate. 

(c)  Go  to  step  2  . 

The  choice  of  the  initial  value  of  p  should  be  directed  by  the  range  of  objective 
function  coefficients  for  the  problem  as  well  as  an  estimate  of  the  elements  of  the  vector 
d.  This  choice  can  be  made  automatically  when  the  Lagrangean  relaxation  is  solved  with 
w  =  0.  Since  it  is  the  elements  of  d  which  cause  the  objective  function  coefficients  to 
change  in  each  successive  iteration  of  the  subgradient  optimization  procedure,  an  initial 
value  of  p  which  keeps  objective  function  coefficients  from  taking  on  values  far  away 
from  the  original  range  is  a  prudent  choice.  Note  that  termination  of  the  lower  bound 
procedure  can  occur  when  further  changes  in  objective  function  coefficients  is  minimal 
as  in  step  3(b). 
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4  The  Upper  Bound 

An  alternate  formulation  of  problem  PI,  referred  to  as  P2,  obtained  by  decompos¬ 
ing  the  problem  is  given  by 

Minimize  g(y) 
s.t.  yeS 

where  for  any  vector  y  =  (ylf  y„  .  .  yK), 


g(y)  =  {min  cx| Ax  =  b;  0  £  x  <£  u;  xk=xK  +  k=yk,  k=  l,2,..,K}, 


and, 

S  =  {y|  0  ^  yk  ^  ukt  for  k  =  1,2 . K). 

The  decomposition  assures  the  satisfaction  of  the  equal  flow  constraints.  The  decom¬ 
posed  problem  P2  is  equivalent  to  the  problem  PI  [12]  and  may  be  solved  using  a  spe¬ 
cialization  of  the  subgradient  optimization  algorithm.  The  objective  function  is 
piece-wise  linear  convex  and  the  subgradient  of  g  at  a  point  y  is  obtained  from  the  dual 
variables,  v„  i  =  1,2,...,2K,  associated  with  the  equal  flow  constraints  in  the  subproblem. 


referred  to  as  P3  and 

given  by, 

Minimize 

cx 

s.t. 

Ax  = 

b 

X, 

y. 

(v.) 

XK  *  1  ~ 

y. 

(VK-l) 

yK 

(vK) 

*JK 

yK 

(v:k) 

0  £  x  £  u. 

The  K-vector 


T]  (Vl  VK  +  l>V2  VK-*Jt  •  •  ’>  VK  "*■  V«) 


is  a  subgradient  of  g  at  y  =  (y„y„...,yK)- 

The  dual  variables  v„  k  =  1,2,. ..,2K  may  be  easily  constructed  from  the  solution 
to  the  pure  network  problem,  referred  to  as  problem  P4; 

{min  cx|  Ax  =  b,  y  x  £  0  }, 

where  the  lower  and  upper  bound  n-vectors  y  and  0  are  defined  by 


Y,  =  9,  =  y„ 

k  =  1,2 . 1 

Yk+,=  9k*»=  y»> 

*- 

ii 

II 

© 

p> 

II 

C 

k  =  2K  +  1 

Let  n  be  the  vector  of  optimal  dual  variables  associated  with  the  conservation  of  flow 
constraints,  Ax  =  b  in  P4  and  the  arc  associated  with  the  variable  Xj  be  incident  from 
node  j,  and  incident  to  node  j,.  The  optimal  dual  variables  for  P3  are  given  by, 

vt  =  -n^  +  +  c„,  k  =  1,2,.. .,2K. 

In  using  the  subgradient  optimization  algorithm  for  the  decomposed  problem  at  each 
point  y,  the  subgradient  T]  can  be  calculated  directly  using  the  above  development. 

It  is  possible  that  moving  a  step  along  the  negative  subgradient  yields  a  point  which 
does  not  belong  to  the  set  S.  As  pointed  out  in  Section'll,  this  point  is  projected  onto 
the  set  S  by  means  of  a  projection  operation  in  the  algorithm.  For  this  model,  the 
projection  operation  decomposes  on  k  so  that  P{y]  =  (P[y,  J,  P(y,J,  .  .  .  ,  P[yK]),  where  the 
projections  P[yJ  are  defined  by: 

if  y*  <  0,  Ply,]  =  o. 

if  y,  >  u„  Ply,]  =  ut. 

If  0  £  y,  £  U„  P(y,]  =  y. 

The  subgradient  optimization  algorithm  for  problem  P2  makes  use  of  a  lower  bound, 
LBND,  on  the  optimal  objective  value  which  is  used  in  step  size  determination  using  a 
variant  of  scheme  (iii)  given  in  Section  II,  as  well  as  in  the  termination  criteria.  Again, 
we  assume  that  both  bounds  are  greater  than  zero. 


ALGORITHM  3:  UPPER  BOUND  ALGORITHM 

J  Initialization 

Select  y  G  S  and  construct  y  and  0. 

Initialize  LBND,  £,  q,  n*,  J  «-  0. 

2  Find  subgradient  and  step  size 

J  -  J  +  1. 

Let  x  and  n  be  the  vectors  of  optimal  primal  and  dual  variables 
for  Min  {cx|  Ax  =  b,  y  £  x  ^  0  }. 

Ifcx  >  UBND, 
n'  <-  n'  +  1, 

if  n'  =  n*,  q  «-  q/2,  n'  <-  0; 
otherwise, 
n'  «-  0 
UBND  -  cx. 

If  (UBND  -  LBND)  £  e(UBND)  and  x  feasible,  terminate  with  x  optimal; 
otherwise, 

vk  «-  -FI^  +  nkt  +  c„  k  =  1,2,...,2K. 

T)  *—  (v,  +  VK  * ,,  .  .  .,  vK+  v2K). 

3  Move  to  new  point 

(a)  y  -  P[y  -q((UBND  -  LBND)/(  ||  T)  ||  a»rj], 

(b)  If  max  (q((UBND  -  LBND)/(||r|!l5))nl)t  <  .01  then  terminate. 

(c)  Go  to  2. 

The  use  of  the  algorithm  parameters  q  and  n*  is  to  help  condition  the  step  sizes 
based  on  the  relative  norm  of  the  subgradient  with  respect  to  the  difference  in  the  lower 
and  upper  bounds.  The  norm  of  the  subgradient  is  dependent  on  the  problem  rather 
than  the  algorithm.  That  is,  it  is  quite  possible  that  the  norm  remains  high  throughout 
the  algorithm.  The  initial  choice  of  q  is  directed  by  an  estimate  of  the  maximum  of  the 
absolute  values  of  the  elements  of  the  vector  d'  as  well  as  the  objective  function  coeffi¬ 
cient  associated  with  artificial  variables  in  the  solution  of  the  pure  network  problem. 
When  allocations  yield  infeasible  solutions,  the  elements  of  T)  are  large  rendering  ||t)||3 
very  large.  An  initial  value  of  q,  if  chosen  arbitrarily,  can  be  small,  thus  requiring  more 
iterations  since  the  improvement  at  each  iteration  is  small.  On  the  other  hand,  if  the 
initial  value  of  q  is  large,  then  for  several  sets  of  n*  iterations  no  improvement  in  the 


r^FVrf 


-»"V'V  "■ii»  -■ 


I 

C. 


■r ' 


objective  occurs  until  the  value  of  q  becomes  smaller.  Here  again,  a  secondary  term- 
nation  criterion  in  Step  3(b)  is  when  further  changes  in  the  allocation  in  step  3(a)  arc 
minimal. 

The  modification  required  for  the  integer  problem  occurs  only  when  the  termi¬ 
nation  criteria  have  been  met  for  the  linear  problem.  The  alternate  formulation  for  the 
integer  problem  is  obtained  by  requiring  that  the  equal  flow  allocation,  y  be  integral 

Minimize  g(y) 
s.t.  y  e  S' 

where  for  any  vector  y  =  (y„  y, . yK), 

g(y)  =  {min  cx|Ax  =  b;  0  £  x  £  u;  x„  =  xK,„  =  yk,  k=  I,2,..,K}, 

and, 

S'  =  {yl  0  ^  y,  ^  u„  for  k  =  1,2,...,K  and  y  integer}. 

Once  termination  occurs  for  the  linear  problem,  the  upper  hound  algorithm  can  he  um  . 
by  requiring  that  the  projection  in  step  3  of  the  algorithm  yield  an  integer  equal  How 
allocation.  Since  the  objective  retains  the  piece-wise  convex  nature  of  the  objective  ti¬ 
the  linear  problem,  the  linear  optimum  obtained  can  be  expected  to  be  close  to  the  in¬ 
teger  optimum.  Adjacent  integer  allocations  can  be  expected  to  provide  bounds  on  the 
integer  optimum  or  else  be  near-feasible  points  for  the  integer  problem. 

5  The  Algorithm 

The  solution  of  the  equal  flow  problem  using  decomposition,  as  given  in  the  pre¬ 
vious  section  can  be  implemented  without  the  lower  bound  procedure.  It  is  also  possible 
to  implement  the  lower  bound  algorithm  independently  for  the  purpose  of  obtaining  a 
lower  bound  on  the  optimal  value  of  the  equal  flow  problem.  For  the  upper  bound 
problem,  some  measure  of  the  lower  bound  on  the  problem  must  be  used  to  aid  in  ter 
mination.  By  merging  the  two  procedures,  an  algorithm  which  adjusts  the  lower  and 
upper  bounds  progressively  can  be  used  to  advantage  and  tied  to  the  accuracy  desired 
for  the  solution.  Not  only  can  such  a  procedure  be  used  for  obtaining  feasible  solutions 
with  relative  ease,  but  it  can  also  provide  a  measure  of  how  close  this  solution  is  to  the 
optimal. 

The  algorithm  for  the  solution  of  the  equal  flow  problem  iterates  between  the  loue: 
bound  procedure  and  the  upper  bound  procedure.  The  lower  and  upper  bounds,  l  HM) 
and  UBND,  progressively  become  tighter,  closing  in  on  the  optimal  solution  to  the 
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problem.  Each  time  the  lower  bound  procedure  is  invoked,  a  maximum  of  17 ERL  iter¬ 
ations  are  performed.  Each  time  the  upper  bound  procedure  is  invoked,  a  maximum  of 
1TERU  iterations  are  performed  However,  the  initial  invocations  of  the  lower  and  up¬ 
per  bound  algorithms  are  allowed  to  terminate  using  criteria  in  those  algorithms  as  op¬ 
posed  to  these  iteration  counts.  The  initial  invocation  of  the  Lagrangean  dual  is 
important  primarily  because  it  afTords  a  very  tight  lower  bound  on  the  objective  value 
of  the  integer  or  linear  optimum  and  further  it  provides  near-optimal  values  of  the 
Lagrange  multipliers.  The  near-optimal  values  of  the  Lagrange  multipliers  tend  to  aid 
the  subgradient  optimization  of  the  decomposition  model.  The  tuning  parameters  for 
the  algorithm  are  as  follows:  ITERL,  ITERU,  m*,  n*.  and  c  (the  termination  criterion.) 

ALGORITHM  4:  RELAXATION/DECOMPOSITION  ALGORITHM 
FOR  THE  EQUAL  FLOW  PROBLEM 

0  Initialization 

Initialize  ITERL,  ITERU,  e. 

T  «-  0,  R  «-  0,  w  «-  0,  UBND  -  oo,  LBND  4-  -00. 

Call  ALGORITHM  2  and  yk  -  min(  u„  (x\  +  xV*)/2  I,  k  =  1,2 . K. 

Call  ALGORITHM  3 

1  Compute  Lower  Bound s 

(a)  Call  ALGORITHM  2  (Steps  2  and  3  (a)). 

(b)  T  <-  T-M 

If  T  <  ITERL,  then  go  to  step  l  (a). 

2  Compute  Upper  Bounds 

(a)  Call  ALGORITHM  3  (Steps  2  and  3  (a)). 

(b)  R  4-  R+  1 

If  R  <  ITERU,  then  go  to  step  2  (a). 

3  Reset  iteration  counts 

T  4-  0,  R  4-  0,  and  go  to  step  1. 

6  Computational  Experience 

The  computer  implementation  of  the  algorithm  is  written  in  standard  FORTRAN 
(called  EQFLO)  and  makes  use  of  MODFLO  [1]  to  solve  pure  network  rubproblcms. 
Based  on  NETFLO  |8],  MODFLO  is  a  set  of  subroutines  which  allows  parametric 
changes  in  costs,  bounds  and/or  requirements  for  a  network  problem  and  subsequent 
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reoptimization.  Computational  testing  was  carried  out  on  the  IBM  3081D  at  The  Uni¬ 
versity  of  Texas  at  Austin  using  the  FORTVS  compilerwith  OPT  =  2.  In  c  der  to  as¬ 
sess  the  computational  gains  afforded  by  the  decomposition/reiaxation  algorithm  for  the 
equal  flow  problem,  each  problem  was  solved  using  MPSX  [7],  All  MPSX  solutions 
have  been  obtained  on  the  IBM 308 ID  at  Southern  Methodist  University. 

The  algorithm  has  been  tested  on  a  set  of  10  test  problems  generated  by  using 
NETGEN  {91,  and  referred  to  by  their  NETGEN  numbers.  Of  the  10  problems  used,  the 
first  three  are  transportation  problems  (problems  5,  9,  and  10),  the  next  four  are  capac¬ 
itated  transshipment  problems  (problems  20,  21,  24,  and  25)  and  the  last  three  are  un¬ 
capacitated  transshipment  problems  (problems  28,  30,  and  35).  The  test  problems  have 
between  200  and  1500  nodes,  and  between  1500  and  6600  arcs.  For  each  problem,  the 
first  2K  arcs  were  paired  to  form  K  equal  flow  side  constraints.  In  order  to  gauge  the 
performance  of  the  algorithm  for  various  values  of  K,  some  of  the  problems  were  gen¬ 
erated  using  the  same  base  network  problem  data  with  K  varying  from  75  to  200. 

The  benchmark  NETGEN  problems  have  a  specified  percentage  of  arcs  which  are 
uncapacitated.  For  these  arcs,  the  capacity  was  defined  to  be  the  maximum  of  all  sup¬ 
plies  and  demands.  For  arcs  in  equal  flow  pairs  which  emanate  from  supply  points,  the 
capacity  used  is  the  supply  at  the  point  of  incidence.  Similarly,  for  arcs  incident  to  de¬ 
mand  points,  the  capacity  used  is  the  corresponding  demand.  IT  an  equal  flow  pair  is 
incident  to  a  demand  point  or  incident  from  a  supply  point,  then  the  capacity  assigned 
is  the  upper  integer  ceiling  of  half  the  corresponding  requirement.  Such  allocation  of 
capacity  is  prudent,  allowing  a  tighter  relaxation. 

Table  I  details  the  computational  testing  of  the  algorithm  with  parameters  and  m* 
=  5,  n*  =  10,  ITERU  =  ITERL  =  10,  e  =  .01.  For  the  test  problems,  EQFLO  ob¬ 
tained  feasible  solutions  whose  objective  function  values  were  within  1%  of  the  optimal 
in  a  fraction  of  the  time  required  by  MPSX  to  obtain  an  optimum.  The  table  reports 
'he  total  solution  times  required  to  produce  an  a  percent  solution  for  the  linear  equal 
flow  problem  and  an  integer  solution.  The  number  of  lower  and  upper  bound 
interations,  respectively  I  and  J,  required  are  provided  along  with  the  norm,  ||d'||,  ob¬ 
tained  during  subgradient  optimization  of  the  Lagrangean  dual.  Note  that  this  norm 
typically  provides  a  metric  for  gauging  the  difficulty  of  a  specific  problem  instance.  Be¬ 
cause  of  the  fact  that  the  lower  bound  procedure  does  not  enforce  equal  flow,  the  norm 
provides  a  measure  of  the  infeasibililty  of  equal  flow  constraints,  or  the  amount  of  im¬ 
balance  which  exists  in  the  problem.  The  zero-tolerance  used  for  flows  on  artificial  arcs 
is  .05.  Of  the  10  problems,  feasible  solutions  were  obtained  for  all  linear  problems. 
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Termination  criteria  employed  for  this  computational  testing  are  stringent  and  the  de¬ 
composition  algorithm  continues  to  attempt  improvement  until  not  only  the  solution  is 
within  1%  of  the  lower  bound,  but  also  until  changes  in  subsequent  allocations  are 
negligible  (.001). 

Initial  allocations,  as  determined  by  x',  obtain  feasible  solutions  well  within  1%  of 
the  lower  bound  obtained  in  6  of  the  10  problems.  For  the  other  problems,  the  initial 
allocation  can  be  feasible  or  infeasible.  Capacities  in  the  randomly  generated  problems 
are  such  that  infeasibilty  occurs  due  to  the  following:  When  a  particular  level  of  allo¬ 
cation  is  enforced,  the  problem  can  become  infeasible  due  to  capacities  falling  below  a 
level  required  to  ensure  all  demand  be  met. 

An  upper  limit  of  5  iterations  were  allowed  in  performing  integer  equal  flow  allo¬ 
cations  with  the  initial  integer  allocation  obtained  by  truncating  the  optimal  linear  allo¬ 
cation.  No  more  than  29  units  of  demand  go  unsatisfied  corresponding  to  99.99% 
feasible  integer  solutions  well  within  1%  of  the  lower  bound  obtained.  The  trade-ofT 
between  enforcing  integer  equal  Hows  and  100%  feasible  solutions  tips  in  favour  of 
making  use  of  near-feasible  solutions,  given  the  relative  computational  ease  with  which 
they  are  produced.  Problem  5  was  attempted  w-ith  MPSX-MIP  where  integrality  was 
only  forced  on  the  75  equal  flow  pairs.  After  over  220  seconds  the  active  branch-and- 
bound  tree  had  over  2000  nodes  and  had  not  as  yet  obtained  the  first  feasible  integer 
solution.  In  less  than  9  seconds,  the  decomposition  procedure  obtained  an  integer  sol¬ 
ution  which  satisfied  399,996  units  of  the  400,000  units  of  demand. 

To  determine  the  impact  of  an  increase  in  the  number  of  side  constraints  on  prob¬ 
lem  characteristics  and  the  algorithm,  additional  testing  with  21,  24,  and  28  is  reported 
in  Table  II.  Each  of  these  base  problems  was  used  to  generate  equal  flow  problems  with 
75,  100,  150,  and  200  equal  flow  constraints.  As  evident  from  the  behavior  of  the  norm 
of  d',  as  the  number  of  side  constraints  increases,  more  imbalance  in  the  problem  is  in¬ 
troduced  and  in  order  to  enforce  equal  flowr,  more  efTort  is  required.  Problem  24  be¬ 
comes  infeasible  once  the  number  of  side  constraints  enforced  becomes  200.  As  would 
be  expected,  the  algorithm  expends  more  efTort  for  the  more  tightly  constrained  prob¬ 
lems,  with  the  exception  that  it  recognizes  an  infeasible  problem  readily.  Again,  for  the 
problems  which  are  feasible,  near-feasible  integer  solutions  arc  obtained  in  approxi¬ 
mately  1  %-60%  of  the  time  required  to  solve  the  linear  problem  using  MPSX. 


The  equal  flow  problem  lends  itself  to  solution  by  decomposition  and  relaxation. 
The  use  of  these  techniques  in  the  solution  procedure  developed  is  advantageous  because 
the  essentia]  solution  mechanism  required  is  the  solution  of  sequences  of  pure  network 
problems.  By  dispensing  with  the  working  basis  required  by  other  techniques,  not  only 
are  computational  efficiencies  afforded  but  the  natural  characteristics  of  the  problem 
enhanced. 

The  algorithm  has  been  shown  to  assist  in  solving  the  integer  equal  flow  problem. 
The  lower  bound  automatically  produces  integer  flows  and  the  projection  of  the  sub¬ 
gradient  in  the  upper  bound  routine  is  altered  to  require  integrality  on  the  equal  flow 
allocation  once  a  near-optimal  linear  solution  has  been  obtained.  The  equal  flow  allo¬ 
cation  for  the  linear  model  is  expected  to  be  close  to  the  equal  flow  allocation  for  the 
integer  model  due  to  problem  structure.  Thus  the  solution  procedure  provides  near- 
feasible,  near-optimal  solutions  for  the  integer  equal  flow  problem  efficiently. 

The  structure  of  the  equal  flow  problem  provides  a  metric  on  the  relative  difficulty 
of  any  specific  problem  instance.  The  proposed  solution  procedure  has  the  innate  ca¬ 
pability  to  distinguish  between  easy  and  difficult  instances  of  an  equal  flow  problem  and 
thus  can  require  only  1%  of  the  MPSX  time  to  solve  an  easy  problem.  As  the  number 
of  equal  flow  constraints  grows,  a  problem  can  become  progressively  infeasible,  since  the 
enforcement  of  equal  flows  can  serve  to  reduce  the  capacity  of  a  cut-set  of  the  network 
to  well  below  required  levels  for  feasibilty.  The  development  for  the  linear  equal  flow 
problem  in  this  paper  can  be  instructive  in  modelling  and  solving  other  network  prob¬ 
lems  with  specially  structured  side  constraints  such  as  proportional  flow  models  used  in 
manpower  planning  The  solution  technique  is  best  suited  for  a  real-world  situation  in 
which  one  must  quickly  produce  near-optimal,  near-feasible  solutions. 
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Table  I.  Comparison  of  EQFLO  with  MPSX  (All  Problems  Have  75  equal  flow  pairs) 


NETGEN 

Number 

Nodes 

Arcs 

MPSX 

Time 

a 

LINEAR 
IM'II  I 

J 

Time 

INTEGER 
Infeas  lime 

5 

m 

3100 

11.4 

0.15 

377 

151 

1 

8.6 

4 

8.8 

9 

liil 

6395 

38.4 

0.01 

1296 

145 

1 

19.6 

3 

19.9 

10 

B9 

6611 

34.2 

0.00 

698 

165 

1 

15.3 

3 

15.7 

400 

1484 

34.8 

0.10 

4729 

544 

262 

23.3 

2 

23.6 

21 

400 

2904 

73.8 

0.00 

1 

6 

1 

00.7 

1 

1.1 

24 

1398 

37.8 

0.29 

8875 

280 

498 

21.4 

3 

21.7 

25 

2692 

93.0 

0.01 

8356 

148 

1 

5.7 

1 

6.3 

28 

1000 

3000 

52.8 

0.14 

3865 

235 

220 

27.8 

29 

28.3 

30 

1000 

4500 

69  6 

0.00 

490 

95 

1 

7.9 

0 

7.9 

35 

1500 

5880 

145.2 

591.0 

0.09 

5386 

218 

151 

46.7 

177.0 

6 

47.7 

181.0 

Times  reported  are  in  CPU  seconds  on  an  IBM30R1D 


Tabic  II.  Effect  of  Increasing  the  Number  of  Equal  Flow  Pairs. 
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Time 

a 
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lid'll  I 

J 

lime 

21 

75 

73.8 

0.00 

1 

6 

1 

00.7 

21 

100 

64  8 

0  00 

1 

6 

1 

00.8 

21 

150 

83.4 

0.08 

880 

130 

202 

14.6 

21 

200 

76.2 

0.47 

2937 

214 

188 

17.5 

24 

75 

37.8 

0.29 

8875 

280 

498 

21  4 

24 

100 

36.0 

0.53 

18283 

240 

307 

19  8 

24 

150 

42.0 

2.32 

24867 

258 

505 

30.7 

24 

200 

infeasible 

28 

75 

52.8 

0.14 

3865 

233 

220 

27.8 

28 

100 

57.6 

0.23 

5206 

213 

182 

28.3 

28 

150 

65.4 

0.30 

6043 

202 

179 

30.4 

28 

200 

72.0 

1.00 

15206 

224 

423 

50.5 

661.8 

262.5 

INTEGER 


Infcas 


lime 


2 

26 

3 

5 

37 


1. 1 
1.1 
15.1 

17.9 

21.7 
19  9 

30.9 
infeasible 
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ABSTRACT 

This  paper  presents  a  parallelization  of  the  simplex  method  for  linear  programming. 
Current  implementations  of  the  simplex  method  on  sequential  computers  are  based  on  a 
triangular  factorization  of  the  inverse  of  the  current  basis.  An  alternative  decomposition 
designed  for  parallel  computation,  called  the  quadrant  interlocking  factorization,  has  pre¬ 
viously  been  proposed  for  solving  linear  systems  of  equations.  This  research  presents  the 
theoretical  justification  and  algorithms  required  to  implement  this  new  factorization  in  a 
simplex-based  linear  programming  system. 
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I.  INTRODUCTION 


The  introduction  of  parallel  computers  into  scientific  computing  in  the  past  decade 
is  the  beginning  of  a  new  era.  The  invention  of  new  algorithms  will  be  required  to  ensure 
realization  of  the  potential  of  these  and  future  architectural  improvements  in  computers. 
Already  the  use  of  parallel  computers  has  given  rise  to  studies  in  concurrency  factors, 
vectorization,  and  asynchronous  procedures.  These  have  led  to  multifold  increases  in 
speed  over  conventional  serial  machines  after  the  calculations  have  been  rearranged  to 
take  advantage  of  the  specific  hardware.  This  paper  presents  a  parallelization  of  the  sim¬ 
plex  algorithm  for  general  linear  programs.  Our  work  begins  with  new  results  for  solving 
systems  of  linear  equations  and  is  directed  toward  the  hardware  design  currently  adapted 
by  Sequent  Computer  Systems,  Inc.  of  Beaverton,  Oregon. 

The  following  notation  is  used  throughout  this  paper.  Let  B;.;  represent  a  subma¬ 
trix  of  B  composed  of  rows  i  through  j  and  columns  k  through  /.  If  i=j  (k=l),  we  write 
B,.t  /  The  j'h  row  (column)  of  B  is  denoted  by  B;,  (B  y).  The  i,jth  element  of 

B  is  B,  j . 

The  linear  programming  problem  is  represented  mathematically  as  follows: 
minimize  cTx 
subject  to  Ax  =  b 
0  <x<u, 

where  A  is  a  known  m  by  n  matrix,  all  other  quantities  are  conformable,  and  all  vectors 
are  known  except  x . 

The  upper  bounded  version  of  Dantzig’s  simplex  method  for  solving  the  linear  pro¬ 
gramming  problem  may  be  stated  as  follows: 


0.  Initialization 


Let  [,vB  lx*']  be  a  basic  feasible  solution  with  A  =  [B  IN],  Let  the  cost  vector 
[ cB  I cN J  and  bounds  [uB  I  be  partitioned  similarly.  Assume  that  B~l  is  avail¬ 
able  in  some  factored  form.  Initialize  iter  to  0  and  the  reinversion  frequency, 
freq. 

1.  Calculate  the  Dual  Variables  (BTRAN) 

7t  <r-CBB~l.  (1.1) 


2.  Pricing 


Let  K  i  =  {j :  xf  =  0  and  cf  -  nN  j  <  0j, 
and  K2  =  {j'-Xj'’  =  u.’j  and  cf  -  nN  j  >  0}. 

M  K  \  K2  =  terminate  with  [xB  Ijc^]  optimal; 
otherwise,  select  Ice  K \  ^jK2  and  set 
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1,  if  ktK i 
-1,  otherwise. 


3.  Column  Update  (FTRAN) 
y  <-B~W 


(1.2) 


4.  Ratio  Test 


Aj  <—  min 

sign  ty,)  =  sign  (5) 


u  B—x  B 

<S2  <—  min  -j  — ■ L - 1—  , 

sign(y,)  =  sign(-d)  [  \yj  1  J 


mim  A]  ,  A2 


N 

Uk 


J- 


5.  Right  Hand  Side  Update 


X?  4-  X?  +  A5 


xB  «-  -  A5y. 

If  A  =  return  to  1 . 

6.  Basis  Inverse  Update 

Let  p  denote  the  index  of  xB  which  produced  A  and  set 

-yjyp  .  if  i*P 
1  lyp  ;  otherwise, 

E  <—  /  -  ep  ej  +  TjeJ 

B~x<r-EB-\  (1.3) 

7.  Reinversion  Check 

iter  iter  +  1 . 

If  mod  (iter  Jreq )  =  0,  then  refactor  B  _1. 

Return  to  1  using  B_1  as  B-1,  the  current  basis  inverse. 

Two  of  the  most  common  factorizations  of  the  basis  matrix  inverse  are  the  product 
form  and  the  elimination  form,  which  correspond  to  the  methods  for  solution  of  linear 
equations  known  as  Gauss-Jordan  reduction  and  Gauss  reduction  (LU  factorization), 
respectively,  where  L  is  a  lower  triangular  matrix  and  U  is  an  upper  triangular  matrix. 
The  elimination  form  produces  a  sparser  representation  of  the  basis  inverse  than  the  pro¬ 
duct  form,  and  accordingly  leads  to  faster  implementation  of  a  simplex  iteration  and  a 
considerable  savings  in  storage. 

Historically,  the  elimination  form  of  the  inverse,  due  to  Markowitz  [1957-1],  was 
the  first  LU  factorization  method  and  was  introduced  to  preserve  sparsity  during  reinver¬ 
sion.  However,  once  reinversion  was  completed  further  pivot  operations  were  handled 


using  product  form.  Bands  and  Golub  proposed  updating  L  and  U  in  a  numerically 
stable  way,  (see  Bartels  [1971-1]).  Their  updating  scheme  tends  to  promote  the  growth  of 
nonzeros  in  U,  leading  to  a  potentially  severe  loss  of  sparsity.  Forrest  and  Tomlin  [1972- 
1]  designed  a  different  updating  scheme  for  the  triangular  factors  to  preserve  sparsity  at 
some  sacrifice  in  numerical  stability.  Subsequent  implementation  of  the  Bartels-Golub 
method,  designed  by  Reid  [1982-1]  and  Saunders  [1976-1],  combine  the  virtues  of  accu¬ 
racy  and  speed. 

Several  parallel  versions  of  the  LU  factorization  algorithm  for  solving  general  linear 
systems  of  equations  are  presently  available  (Chen  et  al.  [1984-1]  and  Dongarra  and 
Sorensen  [1984-2]).  All  versions  are  based  on  restructuring  the  original  serial  algorithm 
to  reveal  possible  independent  tasks  that  can  be  carried  out  concurrently. 

Evans  and  Hatzopoulos  [1979-1]  proposed  a  matrix  factorization,  called  the  Qua¬ 
drant  Interlocking  Factorization  (QIF),  as  an  appropriate  tool  for  solving  linear  systems 
on  parallel  computers.  The  QIF  is  similar  to  the  LU  factorization,  but  is  claimed  to  be 
more  suitable  for  concurrent  computation. 

This  paper  presents  a  parallelization  of  the  simplex  method  using  the  QIF.  The  out¬ 
line  of  the  paper  is  as  follows.  In  Section  II,  the  QIF  is  developed.  An  algorithm  for 
updating  the  QIF  of  B~]  is  presented  in  Section  III.  Mathematically,  the  problem  is  to 
efficiently  obtain  a  factorization  of  B_1  (see  step  6  of  Algorithm  1.1)  from  the  factoriza¬ 
tion  of  B-1.  In  Section  IV,  we  develop  a  parallelization  of  the  reinversion  routine  used  in 
step  7  and  propose  a  parallel  implementation  of  both  the  BTRAN  and  FTRAN  operations 
of  steps  1  and  3. 

The  parallel  algorithms  presented  in  this  study  are  designed  for  a  MIMD  parallel 
computer  that  incorporates  p  identical  processors  sharing  a  common  memory  and  capa¬ 
ble  of  applying  all  their  power  to  a  single  job  in  a  timely  and  coordinated  manner.  The 
Balance  Systems  8000  and  21000  from  Sequent  Computer  Systems  are  examples  of  such 


machines. 


II.  THE  QUADRANT  INTERLOCKING  FACTORIZATION 


In  this  section  we  describe  a  matrix  factorization  suggested  by  Evans  and  Hatzo- 
poulos  [1979-1]  known  as  the  Quadrant  Interlocking  Factorization  (QIF).  This  decom¬ 
position  is  designed  to  solve  linear  systems  on  parallel  computers  (see  Evans  and  Hatzo- 
poulos  [1979-1],  Evans  and  Hadjidimos  [1980-1],  Evans  [1982-1]  and  Feilmeier  [1982- 
1]).  The  factors  and  some  of  their  characteristics  are  described  in  Section  2.1.  We  show 
that  any  nonsingular  matrix  can  be  factorized  into  its  QIF  in  two  ways,  the  Forward  QIF 
and  the  Backward  QIF  The  factorization  algorithms  are  developed  in  Sections  2.2  and 
2.3.  The  relationship  of  quadrant  and  triangular  matrices  is  presented  in  Section  2.4. 


2.1  The  Quadrant  Interlocking  Factors 


Consider  the  following  matrix 


1  0 

“'2.1  1 

“3,1  “'3,2 


U  = 


0  0 

0  “  2/n 

“  3 ,m  -1  3,m 


“m  -  2, 1  “m  -2.2 
“' m  - 1 , 1  0 

0  0 


“m-2^n-l  wm-2sn 
1  “  m  - 1  /n 

0  1 


Note  that  the  non-arbitrary  entries  of  W  are  given  by 


1 ,  i  =j ; 

0,  /=] \ml 2]  ,  j=(i  + 1) (m-/  +  l); 

0  ,i-m m  ,j=m-i  + 1 i  —  1 ; 


(2.1) 


(2.2 1 


[x }  =  the  largest  integer  not  greater  than  the  value  of  x 
fn  =  m  +  1  -  [m/2]. 


Also,  consider  the  matrix 


z!.l  2  1.2 

o  2  2,2 

0  0 


0  0 


1.1  zm.2 


2  l.m-1  2  \jn 

22,m-\  0 

0  0 


0  0 
2m,m- 1  2m,m 


Note  that 


7  =  1 1(m  — 1)/2) , 

j=[ml2]+2,...jn  ,  i=m+2-j . j- 1. 


Any  square  matrix  may  be  partitioned  by  its  diagonal  and  secondary  diagonal  into 
four  quadrants.  The  potentially  nonzero  elements  of  W  are  in  the  left  and  right  quadrants 
while  those  of  Z  are  in  the  upper  and  lower  quadrants.  Therefore,  we  call  any  square 
matrix  whose  nonzero  structure  follows  (2.1)  and  (2.2),  or  one  that  can  be  brought  to 
such  a  form  by  row  and/or  column  interchanges  a  left-righi  quadrant  (LRQ)  matrix. 
Similarly,  any  square  matrix  whose  nonzero  structure  follows  (2.3)  and  (2.4),  or  one  that 
can  be  brought  to  such  a  form  by  row  and/or  column  interchanges  is  called  an  upper- 
lower  quadrant  (ULQ)  matrix  .  Examples  of  W  and  Z  matrices  for  an  odd  and  an  even  m 
are  given  below: 


Example  2.1  (m=  5) 


f - 
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■ 

■ 

" 

1 

0 

0 

0 

0 

zu 

z  1,2 

2  1.3 

2  1,4  2  1,5 

"’2,1 

1 

0 

0 

W'2,5 

0 

z  2,2 

2  2,3 

2  2.4  0 

W'3.1 

'v  3,2 

1 

W'3,4 

w3,5 

,  z  = 

0 

0 

2  3.3 

0  0 

W’4,l 

0 

0 

1 

^4,5 

0 

z  4,2 

24,3 

24,4  0 

0 

0 

0 

0 

1 

z5,l 

2  5.2 

2  5.3 

2  5,4  2  5.5 

Example  2.2  (m=6) 


r- 

-i 

' 

1 

0  0 

0 

0 

0 

2 1.1 

2 1,2 

2  1.3 

2  1.4 

2  1.5 

2  1.6 

u  2,1 

1  0 

0 

0 

^2,6 

0 

2  2.2 

2  2,3 

2  2.4 

2  2.5 

0 

u'3,l 

ve  3,2  1 

0 

W’3,5 

w3.6 

,  z  = 

0 

0 

2  3.3 

2  3.4 

0 

0 

u'4.1 

W’4,2  0 

1 

W’4,5 

^4,6 

0 

0 

2  4.3 

2  4,4 

0 

0 

H'5.1 

0  0  0 

1 

W5.6 

0 

2  5.2 

2  5.3 

2  5.4 

25.5 

0 

0 

0  0 

0 

0 

1 

2  6.1 

2  6.2 

2  6.3 

2  6.4 

2  6.5 

2  6,6 

. 

Without  loss  of  generality  we  assume  that  m  is  even.  For  linear  programming,  we 
(  „  can  always  append  a  nonbinding  constraint  so  that  the  total  number  of  constraints  is 

^  even. 

The  set  of  all  LRQ  matrices  of  order  m  is  denoted  by  [M£]  and  the  set  of  all  ULQ 
matrices  of  order  m  is  denoted  by  [Mm)-  Let  A  zRm-m  and  A  =A,  j  .e,  .ej.  If 
(A  +1  )e{MmJ  we  say  that  At>J  is  a  W-elcment  ;  otherwise,  it  is  a  non-W -element  .  Simi¬ 
larly,  if  A  z{Mm)  we  say  that  A,  j  is  a  Z-clemcnt ;  otherwise,  it  is  a  non-Z-elemenl. 
Proposition  2. 1 

[Mm  \  and  [M^]  are  closed  under  addition,  scalar  multiplication,  multiplication  and 
inversion  . 

(The  proof  of  this  Proposition  may  be  found  in  Zaki  [1986-1]). 


2.2  The  Forward  Quadrant  Interlocking  Factorization  Algorithm 


In  this  section  we  present  an  algorithm  which  obtains  the  HZ  factorization  of  any 
nonsingular  matrix.  That  is,  given  a  nonsingular  matrix  By  find  H'  and  Z  such  that 
B  =  WZQ ,  where  Q  is  a  permutation  matrix.  This  factorization  is  analogous  to  the  LU 


factorization  in  common  use  in  many  production  linear  programming  packages. 
Definition  2. 1 


An  elementary  left-right  quadrant  (ELRQ)  matrix  of  order  m  and  index  k  is  a  matrix  of 
the  form: 

Nk  -l  -ukej  -\keJ  (2.5) 

where 

l=m-k+  1  ,  k  El,2,...,(m  /  2)-l,  (2.6) 

ej  uk=  0  and  e?  vk=0  for  i  =  \,2,...,k  ,1  ,l+\ m.  (2.7) 

The  conditions  (2.7)  require  that  the  first  k  and  last  k  components  of  uk  and  vk  be  zero, 
that  is,  uk  andv*  have  the  form: 

uk  =  (0,0 0,u£+i  ,uL 2 _ uk.k,0,0,...,0)T  (2.8) 

v*  =  (0,0....,0,v^+i  ,v^+2 _ v*_*,0,0„..,0)7\  (2.9) 

In  general  an  ELRQ  matrix  of  order  m  and  index  k  has  the  form  depicted  in  Figure 
2. 1 .  Thus,  an  ELRQ  matrix  of  index  it  is  a  LRQ  matrix  whose  only  nonidentity  columns 
are  columns  k  and  /  (/=m-it  +  l).  ELRQ  matrices  are  easily  inverted.  It  is  apparent  that 

A rkY' =/ +ukc[ +vke?  (2.10) 

uhich  is  also  an  ELRQ  matrix  of  index  k. 

Proposition  2.2 

Let 

N^  =  NlN2  ■  ■  -Nk  (2.11) 

where  A  '  is  an  ELRQ  matrix  of  index  i  ,  i=l,2....,it .  Then  N(~k)  is  a  LRQ  matrix  whose 
j'h  and  On-j+ l)w  columns  are  those  of  Ni . 

(The  proof  of  this  Proposition  may  be  found  in  Zaki  [1986-1]). 

Definition  2.2 


A  partially  reduced  upper-lower  quadrant  (PRULQ)  matrix  of  index  k  and  order  m  is  a 
square  matrix  whose  non-Z  elements  are  zero  in  columns  1  through  k- 1  and  /+1  through 
m ,  where  k  =  l,2,...,m/2  and  l  =m-k+ 1.  Its  general  form  is  shown  in  Figure  2.2.  Note 
that  B 1  has  no  special  zero  structure  and  Bm/2  is  an  ULQ  matrix. 

Proposition  2.3 

Let  Bk  be  a  PRULQ  matrix  of  index  k.  If  Bk  is  nonsingular  then  there  exist  j\  and  ji 
such  that  k  <j\<  and 

6  =  BtJx  .Bfa-Btj,  .Bth  *  0.  (2.12) 

Proof 

Suppose  6=0  for  every  k<j  i<_/2</.  Then  Bp_  must  be  a  multiple  of  Bp  .  This  contradicts 
the  assumption  that  Bk  is  nonsingular. 

Permuting  the  columns  of  a  PRULQ  matrix  so  that  certain  elements  provide  a  non¬ 
singular  2x2  submatrix  is  analogous  to  interchanging  rows  and  columns  in  matrix  inver¬ 
sion  to  obtain  a  nonzero  pivot  element.  Now,  let  Bk  be  a  nonsingular  PRULQ  matrix  of 
index  k  .  Let  j\  and  ji  satisfy  Proposition  2.3  and  define  Qk  to  be  the  permutation  matrix 
such  that 


Bk  =Bk  Qk 

where 


Bkk=Bk]t  and  Bk,=Bkn. 


(2.13) 


Let  A  be  any  square  matrix  of  order  m  and  let  ke{],...,ml2}.  Define  Sk(A  )  to  be  the  fol¬ 
lowing  2x2  principal  submatrix  of  A 


Sk(A)  = 


Ak..k  Ak.i 

Ai,k  Aij 


(2.14) 


where  l=m-k  + 1.  Using  these  definitions  and  Proposition  2.3,  it  is  clear  that 
Bk  =Bk  Qk  is  a  nonsingular  PRULQ  matrix  of  index  ^  and  Sk  {B^  is  nonsingular. 


We  now  show  how  one  may  transform  a  PRULQ  matrix  of  index  k  into  a  PRULQ 
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matrix  of  index  £+1 . 

i 

Proposition  2.4  ! 


Let  fl*  be  a  nonsingular  PRULQ  matrix  of  index  k  and  let  Qk  be  the  permutation  matrix 
that  interchanges  columns  k  and  m-k  + 1  with  columns  j  j  and  ji,  respectively,  where  j\ 
and  j 2  are  obtained  so  that  they  satisfy  Proposition  2.3.  Let  Nk  be  an  ELRQ  matrix  of 
index  k  whose  uk  and  v*  vectors  are  determined  by  solving  the  following  (m-2k)  2x2 
linear  systems 


< 

I 


«/"  v* 


Sk(Bk)  = 


Bt k  Bkj 


,  i=k  +  \,...,m-k. 


(2.15) 


Then  Bk  +  ]  =  Nk  Bk  Qk  is  a  nonsingular  PRULQ  matrix  of  index  k  +  1. 
Proof 


Since  Bk  is  nonsingular  and  Nk  is  nonsingular,  then  Bk  +  l  is  nonsingular.  +  1  is  a 
PRULQ  matrix  of  index  k  +  l  if  all  non-Z-elements  in  columns  1  through  k  and  /  through 
in  are  zero.  Since  Bk  is  a  PRULQ  matrix  of  index  k  ,  we  only  need  to  show  that  the 
effect  of  Nk  on  Bk  is  to  zero  out  the  non-Z-elements  in  columns  k ,/.  To  show  this,  we 


begin  by  rewriting  (2.15)  as 


u 


k 


B‘t.k  B'h 
Btj  Btj 


or  for  i  =  k  +  \,k+2,...,m-k 


ukB£k  +  vkBt.k=Bt.k 

ukB£j  +  '’k-Btj=Bkj. 


k.  +  \ 

We  now  consider  the  non-Z-elements  of  B  . 
For  /  =  k  +  \'k+2,...,m-k 


(2.16) 

(2.17) 


B,kt] 

1  =  a'A 

Bkk 

=  -uk 

•  B^  ~  ■  Bf.k  +  Bkk  =  0  by  (2.16). 

(2.18) 

Btjt' 

i  =  Nk 

'  Bkj 

=  -uk 

■  Bt.i  -  v*  ■  Btj  +  Btj  =  0  by  (2.17). 

(2.19) 

3- 


t 


BtjX  =  N,  Bkj  =  0  for_/  =  l,...,£-l  and /  +  1 m.  (2.20) 

Also  we  note  that  the  desired  zeros  created  in  earlier  stages  in  Bk  are  i.  ,i  affected  by  A'* 

,  since  for  i  =  + 

S*  + i  =  N*.  ■  fl  *  =  e,  ■  Bk  =  B*. .  (2.2 1 ) 

From  (2.18)  through  (2.21)  we  conclude  that  Bk  +  1  is  a  PRULQ  matrix  of  index  k  +  \. 

Given  the  above  definitions,  the  forward  quadrant  interlocking  factorization  algo¬ 
rithm  may  be  stated  as  follows. 

Algorithm  2.1  :  The  Forward  Quadrant  Interlocking  Factorization 

Let  BzRm'm  .  The  following  steps  decompose  B  to  its  quadrant  interlocking  factors  with 
B  =\V  Z  Q. 

Initialize 

Bi  =  B , 

K  =  m  12. 

Main  Loop 

For  k  =  1,2 K- 1 

1.  Column  Permutation 

Find  j  j  and  j  2  satisfying  Proposition  2.3. 

If  none  exists,  then  terminate  with  the  conclusion  that  B  is  singular. 
Otherwise,  construct  Q k  using  j]  and  jj- 

2.  Compute  the  vectors  uk  ,  v* 

by  solving  the  (m  -2k  )  2x2  linear  systems,  (2.15). 

3.  Construct  Nk 

Nk  -  I  —  uk  ej  —  vk  e] . 

4.  Construct  B  k*] 

Bk* 1  =  Nk  Bk  Qk. 

Next  k 


4 


.  rr a v  av *  a" A’ a v.v  •  avjvaav.‘av.a^aav.*a"a"a  =7v:^: /.v.t; tv*j 


■-'  '■<■  ”~.T r<_-  XT  \_-  *.'  K_- ■ 


c 


# 


9 


9 
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Proposition  2.5 

Let  B  be  a  nonsingular  matrix  of  size  m.  Then  Algorithm  2.1  decomposes  B  to  its  for¬ 
ward  quadrant  interlocking  factors  , 

B  =  W  Z  Q  (2.22) 

where 

(1)  We  {M%}  ,  W  =  (NK~'NK~2  ■  ■  ■  AM)-i, 

(2)  Ze  {M^j  ,  Z  =5^, and 

(3)  Q  is  a  permutation  matrix  ,  Q  =  {Q^Q2  ■  ■  ■  QK~^)~^. 

Proof 

LetB1  =  B  .  Applying  Proposition  2.4  for  k  =l,2,...,(m/2)-l,  we  obtain 

Bk  =  Nk~ 1  Nk~ 2  ■  •  •  N'  B'  Q'  •  •  •  Qk~2Qk'\  (2.23) 

where  is  an  ULQ  matrix,  NJ ,  j  =  1  —  1  are  ELRQ  matrices  as  computed  in  (2.15) 

and  Q>  are  permutation  matrices.  From  (2.23), 

fll  =(N*-1  Nk~2  ■  •  -N1)-1  BK  (Q1  •  •  •  QK-2QK-\y\  (2.24) 

Let  =  (NA_1  NA_2  •  •  ■  TV1)-1  .  By  Proposition  2.2  N<-K~l')  is  a  LRQ  matrix.  Also, 

let  Q(A_1)  =  (Q1  ■  •  ■  Q^”2  Q*-"1)-1.  Since  the  product  of  permutation  matrices  is  a  per¬ 
mutation  matrix,  is  a  permutation  matrix.  Thus,  (2.24)  can  be  written  as 

Bl=B  =N<k-V  Bk  Q(k~ i),  (2.25) 

and  (2.22)  follows  by  setting  W  =N^-'\Z  =S^,andQ  =  Q(*-1)in  (2.25). 

Proposition  2.6 

Algorithm  2.1  without  column  permutations  requires 

m3l 3  +m2l2-  4m /3 

multiplications  on  a  sequential  machine. 

Proof 


r, 

it 

» 


d 


* 


o 

vi 

‘•j 

■i 

•J 

■j 

* 


» 


Ignoring  column  permutaiions,  we  trace  the  operations  in  the  main  loop  excluding  step  1 


The  number  of  multiplications  to  compute  uk  and  v* 

K- 1 

=  ^[2  +  6(m-2k)] 

=  m  +  3m  (m -2)12.  (2.26) 

The  number  of  multiplications  to  compute  £*+! 

K-\ 

-  2-1  (m  -2k  )2 

=  m.  (m-l).(m  -2)/3.  (2.27) 

Summing  (2.26)  and  (2.27)  we  obtain  the  specified  total  number  of  multiplications. 

In  Algorithm  2.1  the  columns  of  the  PRULQ  matrix  are  permuted  to  find  a  2x2 
matrix  with  a  nonzero  determinant.  There  are  obvious  alternatives  that  may  be  used.  To 
ensure  numerical  stability  for  instance,  we  may  find  the  matrix  whose  determinant  has 
the  largest  absolute  value,  or  the  matrix  that  has  the  smallest  condition  number.  Another 
approach  is  to  permute  the  rows  of  the  PRULQ  matrix  to  find  the  required  nonsingular 
2x2  matrix  attempting  to  minimize  fill-in  in  the  nonpivot  rows.  Both  row  and/or  column 
permutations  can  be  selected  on  numerical  stability  and/or  sparsity  grounds. 

2.3  The  Backward  Quadrant  Interlocking  Factorization  Algorithm 

Unlike  the  triangular  factors  (L,U)  of  a  matrix,  the  quadrant  interlocking  factors 
(W,Z)  possess  different  potential  density.  That  is,  the  number  of  potentially  nonzero  ele¬ 
ments  in  W  is  different  than  that  in  Z.  In  this  section  we  present  an  algorithm  which 
obtains  the  Z\V  factorization  of  any  nonsingular  matrix.  We  refer  to  this  algorithm  as  the 
Backward  QIF  algorithm,  as  opposed  to  the  Forward  QIF  algorithm  of  Section  2.2  that 
produces  the  W'Z  factorization.  The  development  of  this  algorithm  is  very  similar  to  the 
previous  one.  The  proofs  of  Propositions  2.7  through  2.10  in  this  section,  use  arguments 
similar  to  those  used  in  Propositions  2.2  through  2.5  and  hence  are  omitted. 
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Dcfinition  2.3 

An  elementary  upper  lower  quadrant  (EULQ)  matrix  of  order  m  and  index  k  is  a  matrix 
of  the  form  : 

Mk  =fm  -  rk  ej -  sk  ej - ek-ej -  erej  (2.28) 

where 

l  =  m  -  k  +  1 ,  Jt  el  ,2,  •  •  ■  jm  12, 

cjrk-  0  and  ej-sk= 0  for  i-k  +  \,k+2,...,l.  (2.29) 

The  conditions  (2.29)  require  that  components  k+1  through  m-k  of  rk  and  sk  be  zero, 
which  are  the  non-Z-elements  of  rk  and  sk  in  Mk .  That  is,  rk  and  sk  have  the  form  : 

rk  =  (r* . r(£,0,...,0,r/:,...,r*  )r  (2.30) 

r*  =  (.v  {,..., r/, 0 . 0„r/,...,r^)r.  (2.31) 

Thus,  an  EULQ  matrix  of  index  k  and  orders  is  an  ULQ  matrix  whose  only  nonidentity 
columns  are  columns  k  and  /  (l=m-k+ 1).  In  general,  it  has  the  form  depicted  in  Figure 
2.3. 

The  set  of  all  nonsingular  EULQ  matrices  is  closed  under  inversion,  and  the  inverse 
of  any  nonsingular  EULQ  matrix  of  index  k  is  another  EULQ  matrix  of  index  k. 
Proposition  2.7 

Let  M(k  *  =  MkMk~]  ■  ■  ■  where  M‘  is  an  EULQ  matrix  of  index  i  ,  i=l,2 . k.  Then 

M(k]  is  a  ULQ  matrix  whose  j,h  and  (m-j+\)st  columns  are  those  of  MJ  ,  j=\,2,...,m/2. 
The  proof  is  similar  to  that  of  Proposition  2.2. 

Definition  2.4 

A  partially  reduced  left-right  quadrant  (PRLRQ)  matrix  of  index  k  and  order  m  is  a 
square  matrix  whose  non-W-elements  are  zero  in  columns  k  + 1  through  m-k.  Note  that 
Bm'2  has  no  special  zero  structure  and  B  1  is  an  LRQ  matrix.  In  general,  a  PRLRQ  matrix 
is  of  the  form  shown  in  Figure  2.4. 


Let  Bk  be  a  PRLRQ  matrix  of  index  k.  If  Bk  is  nonsingular  then  there  exist  j\  and  ji 
such  that  1  <  j  \  <  k  and  l  <j  2  ^  m  and 


> 


5 


Btj,  *  0 


(2.32) 


The  proof  is  similar  to  that  of  Proposition  2.3. 

Now  let  j  1  and  j 2  satisfy  Proposition  2.8  and  define  Pk  to  be  a  permuted  identity 
matrix  with  column  j  j  in  the  kth  position  and  ji  in  the  Ith.  Let  Bk  be  a  nonsingular  i 

PRLRQ  matrix  of  index  k.  Obviously,  Bk  =  Bk  Pk  is  a  nonsingular  PRLRQ  matrix  of 
index  k  ,  and  Sk(Bk )  is  nonsingular. 

Using  Mk  of  (2.28)  and  the  Pk  defined  above,  the  elimination  operation  needed  to 
reduce  a  PRLRQ  matrix  of  index  k  a  step  further  is  given  by  the  following  Proposition. 

Proposition  2.9 

Let  bea  nonsingular  PRLRQ  matrix  of  index  k  ,  let  j  \  and  j 2  satisfy  Proposition  2.8, 
let  Pk  be  the  permutation  matrix  that  permutes  columns  k  and  j\  and  columns  m-k  + 1 
and  j 2-  Let  Mk  be  an  EULQ  matrix  of  index  k  whose  rk,sk  vectors  are  determined  by 
solving  the  following  2k  -2  linear  systems 


S(Bk^  = 


Bh  Bkj 


,  i  =  \,...,k~\  and  /  +  !,.. .,m 


(2.33) 


along  with  the  system 


1 


rk  st 
rt  st 


Sk(Bk ) 


Then  Bk~]  =  Mk  Bk  Pk  is  a  nonsingular  PRLRQ  matrix  of  index  k- 1. 


(2.34) 


1 

f 

i 


Given  the  above  definitions,  we  may  state  the  backward  Q1F  algorithm  as  follows: 


Algorithm  2.2  :  The  Backward  Quadrant  Interlocking  Factorization 

Let  B  zRm-m .  The  following  steps  decompose  B  to  its  QIF  with  B  =  Z  W  P 
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Initialize 

Bml-  =  B , 

K  =  m/2. 

Main  Loop 

For  k  =  K,K-\,K-2,...,\ 

1.  Column  Permutation 

Find  j  i  and  j 2  satisfying  Proposition  2.8. 

If  none  exists,  then  terminate  with  the  conclusion  that  B  is  singular. 
Otherwise,  construct  P  k  using  j\  and  ji- 

2.  Compute  the  vectors  rk  ,sk 

by  solving  the  (2£-l)  2x2  linear  systems  (2.33)  and  (2.34). 

3.  Construct  Mk 

Mk  =/m  -  rk  el  -  sk  ef  -  ek-ej  -  e,  ej . 

4.  Construct  Bk~] 

Bk~]  =  Mk  Bk  Pk. 

Next  k 


Proposition  2. 10 

Let  B  be  a  nonsingular  matrix  of  size  m.  Then  Algorithm  2.2  decomposes  B  to  its  back¬ 
ward  QIF, 


where 


B  =  ZW  P 


(2.35) 


(1)  Ze  (Mk)  -Z  =(M]M2 3  ■  ■ 

(2) Wz{M»}  ,W  =£!,and 

(3)  Pisa  permutation  matrix  ,  P  =  (PK  ■  ■  ■  P  1)~1. 


The  proof  is  similar  to  that  of  Proposition  2.5. 


I 


As  with  the  Forward  QIF  Algorithm,  row  and/or  column  permutations  can  be 


adopted  to  ensure  numerical  stability  and/or  sparse  factors. 


2.4  Some  Characteristics  of  Quadrant  Matrices 

In  this  section  we  reveal  a  relationship  between  the  quadrant  and  triangular 
matrices,  which  has  not  previously  appeared  in  the  open  literature  (e.g.  Evans  and  Hatzo- 
poulos  [1979-1],  Evans  and  Hadjidimos  [1980-1],  Evans  [1982-1],  Feilmeier  [1982-1], 
Hellier  [1982-1],  and  Shanehchi  and  Evans  [1982-1]).  A  permutation  algorithm  that  res¬ 
tructures  any  quadrant  matrix  as  a  block  triangular  one  is  presented. 

Consider  the  following  matrices 


' 

■ 

1  0  x  x  .  x  x 

X  .X 

1  X  X  .  X  X 

X  X 

\  0  .  x  X 

X  X  X  X 

1  .  X  X 

X  X  X  X 

,  IV  = 

X  X  X  X  XX 

X  X  X  X  XX 

i  o 

1 

Where  .v  stands  for  a  potential  nonzero  element.  Note  that  Z  is  a  lower  Hessenberg 
matrix  with  a  special  zero  distribution  on  the  superdiagonal.  Also,  \V  is  a  unit  upper  tri¬ 
angular  matrix  with  special  zero  distribution  on  the  superdiagonal. 

Now  w  e  present  an  algorithm  that  relates  W  of  (2.1)  and  Z  of  (2.3)  to  W  and  Z  o! 
(2.36). 

Algorithm  2  J  '  The  Permuiaiion  Algorithm 

Let  R  ,  S ,  and  T  be  square  matrices  of  order  m,  where  R  is  the  input  matrix  to  the  algo 
nthm  and  T  is  the  output  matrix.  The  following  algorithm  permutes  the  columns  and 
rows  of  R  such  that: 

(a)  if  R  is  a  LRQ  matrix  then  T  is  a  IV  of  (2.36),  and 

(b)  if/?  is  a  ULQ  matrix  then  T  is  a  Z  of  (2.36). 


1.  Column  Permutation 


-'rV- 


S  ,m-2j  +  \  < 
S  ,m  -2 j+2  4 

Next  j 

2.  Row  Permutation 
For  i  =  \,2,.../n/2 


R  ,,m-j+ 1 


7m-2i  +  l„ 

Tm  —2i  +2,.  ^  -i+l 


Next  / 


An  example  of  the  permutation  algorithm  is  given  below  form  =6. 


Example  2.3  (m=  6) 


1  0  0  0  0  0 

H'2,i  1  0  0  0  w2,6 

"3,1  "3,2  J  0  "3,5  "3,6 

"'4,1  "'4.2  0  1  "'4.5  "4.6 

"5,1  0  0  0  1  "'5,6 

0  0  0  0  0  l’ 


2  1.1  2  1.2  2  1.3  2  1,4  2  1,5  2  1,6 

0  2  2.2  2  2,3  2  2,4  2  2,5  0 

0  0  2  3,3  2  3,4  0  0 

0  0  Z43  Z44  0  0 

0  25,2  25,3  25,4  2 5,5  0 

2  6.1  2  6,2  2  6.3  2  6,4  2  6,5  2  6,6 


1  0  "3,2  "3.5  "3.1  "3.6 

0  1  "4.2  "4.5  "4.1  "4.6 

0  0  1  0  "2,1  "2,6 

0  0  0  1  "5,1  "5,6 

0  0  0  0  1  0 

0  0  0  0  0  1 


2  3,3  2  3,4  0  0  0  0 

2  4,3  2  4,4  0  0  0  0 

2  2.3  2  2,4  2  2,2  2  2,5  0  0 

2  5,3  2  5,4  2  5,2  2  5,5  0  0 

2  1,3  2  1,4  2  1.2  2  1.5  2  1,1  2  1,6 

2  6,3  2  6,4  2  6,2  2  6,5  2  6,1  2  6.6 


This  clearly  shows  that  the  quadrant  matrices  are  permuted  block  triangular 
matrices  w  ith  blocks  of  size  2,  That  is,  the  Forward  (Backward)  Quadrant  Interlocking 
factorization  is  equivalent  to  a  block  Doolittle  (Crout)  decomposition  with  blocks  of  size 


On  sequential  computers,  a  QIF  is  not  expected  to  be  faster  than  any  triangular 


decomposition.  Since  computing  the  entries  of  the  factors  by  solving  2x2  systems 
requires  more  operations,  as  shown  in  Proposition  2.6.  Also,  finding  a  nonsingular  2x2 
submatrix  is  more  expensive  than  finding  a  nonzero  element.  However,  on  parallel  com¬ 
puters,  the  QIF  is  expected  to  be  competitive.  Since  the  number  of  entries  that  can  be 
produced  concurrently  in  every  stage  is  doubled,  and  the  number  of  stages  is  halved  as 
compared  to  a  triangular  factorization  algorithm.  Therefore,  we  may  view  the  column 
permutation  step  in  Algorithms  2.1  and  2.2  searching  for  a  nonsingular  2x2  submatrix  as 
a  computation  decoupling  price  we  pay  for  the  concurrency  gained  in  steps  2-4. 

Determining  the  relationship  between  quadrant  and  triangular  matrices  is  a  key 
observation  that  we  will  use  in  the  following  section  to  design  appropriate  updating 
scheme  for  the  quadrant  interlocking  factors  of  the  basis  matrix  in  the  simplex  method. 
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III.  UPDATING  THE  QIF  OF  THE  BASIS 

At  the  beginning  of  a  simplex  iteration,  suppose  the  basis  has  the  form 

B  =  Z  W  R ,  (3.1) 

where  we  assume  forms  (2.36)  for  Z  and  W ,  and  R  is  a  permutation  matrix.  When  the 
entering  column  A  j  replaces  the  leaving  column  B  at  the  end  of  the  simplex  iteration, 
we  have  a  new  basis  matrix  B  which  is  related  to  the  previous  basis  matrix  B  by  the  for¬ 
mula 

B  =B  E  (3.2) 

where  £  is  an  eta  matrix  whose  plh  column  is  (£_1  A  j),  and  all  other  columns  are  the 
identity  columns.  From  (3.1)  and  (3.2)  B  can  be  written  as 

B  -Z  W  R  E.  (3.3) 

An  updating  scheme  is  a  sequence  of  operations  applied  to  the  right  side  of  (3.3)  to 
return  it  to  the  form  given  by  (3.1),  i.e. 

B  =  Z  W  R,  (3.4) 

where  U'  ,  Z  are  the  new  Q.I.  factors  and  £  is  a  permutation  matrix.  We  present  an  algo¬ 
rithm  designed  to  derive  (3.4)  given  (3.3).  It  is  similar  to  the  Forrest-Tomlin  [1972-1] 
update  for  the  triangular  factors  of  the  basis.  Since  the  spike  is  in  W ,  our  strategy  is  to 
reduce  the  spiked  W,  i.e.,  VV£,  to  an  LRQ  matrix  using  elementary  ULQ  matrices.  The 
following  algorithm  exploits  the  triangular  form  of  W  and  the  existence  of  2x2  identity 
blocks  on  the  diagonal  of  W . 

In  this  presentation  we  use  the  term  brother  columns  (rows)  to  indicate  columns 
(rows)  that  have  the  same  potential  nonzero  structure,  execluding  the  diagonal  entries  in 
case  of  LRQ  matrices.  Thus,  for  LRQ  matrices  in  the  form  of  (2.36)  columns  (rows) 


i  ,i  -f  1  are  brother  columns  (rows)  for  i  =1,3,  •  ■  •  jn- 1. 


The  first  step  of  this  scheme  is  a  column  permutation  followed  by  a  row  permuta¬ 
tion.  In  Figure  3.1  an  example  is  presented  to  illustrate  this  step,  in  which  R  of  (3.3)  per¬ 
mutes  columns  2  and  4  of  IT  and  x  stands  for  potentially  nonzero  elements.  Thus,  IT  and 
U7?  are  as  illustrated  in  Figure  3.1  (a)  and  (b).  From  (3.3)  we  obtain 

Z-1  B  =  \V  R  E 
=  S, 


where  5  is  illustrated  in  Figure  3.1  (c)  and  y  stands  for  the  elements  of  the  column  vector 
(Z_1  A  j ).  Note  that  if  (Z~]  A  tJ )  has  the  same  zero  structure  as  W  ^ ,  then  the  new  fac¬ 
tors  arc  immediately  available.  That  is,  U-'  is  5  and  Z  is  Z.  If  this  is  not  the  case,  we 
place  S  in  a  spiked-U'  form  S  as  shown  in  Figure  3.1  (d),  by  applying  the  column  permu¬ 
tation  R  -1  to  S'  to  undo  the  effect  of  R  .  That  is, 


Z-1  B  R~]  =  \V  R  E  /?“' 

=  S  R -1 

=  5.  (3.5) 


Suppose  q  <  m-1.  We  apply  the  column  permutation  R  to  S,  placing  the  spike  and  the 
brother  of  the  leaving  column  in  the  positions  m  and  m-1,  respectively,  and  moving  all 
intervening  columns  forward  to  produce  the  matrix  H? ,  as  illustrated  in  Figure  3.1(e). 
We  then  apply  the  row  permutation  R~l  to  H?  placing  the  qth  row  and  its  brother  row  in 
positions  m  and  m-1,  respectively,  moving  all  intervening  rows  two  places  up  to  pro¬ 
duce  the  matrix  Hi  as  shown  in  Figure  3.1  (f),  where 

_  _  I  q,  if  q  is  odd  \ 

Q  ~  j  q-\,  if  q  is  even. 

Note  that  q  is  odd.  Of  course,  if  <7  >  m-1,  then  R  =  / .  Now  (3.5)  becomes 


R-]  Z~]  B  R~'R  =R-]  W  R  E  R~'R 
=  R -1  S  R~]  R 
=  R~]  S  R 


I 


Figure  3.1.  Illustration  of  the  double  column  and  row  permutation 

(m=8,p=2,  0=4,  q=3  ). 


11  o  x  x 

xxxxxxxx 

x  x  x  y 

lo  1  x  x 

X  X  X  X  X  X  X  X 

x  x  x  y 

11  0 

xxxxxxxx 

x  x  x  y 

lo  1 

xxxxxxxx 

x  x  x  y 

lloxxxxxx 

x  x  x  y 

lo  1  X  X  X  X  X  X 

x  x  x  y 

11  0  X  X  X  X 

x  x  o  y 

lo  1  X  X  X  X 

x  x  o  y 

11  0  X  X 

x  x  o  y 

lo  1  X  X 

x  x  o  y 

11  o 

x  x  o  y 

lo  1 

x  x  o  y 

11  o  o  y 
lo  1  o  y 

lx  xlx  xlx  X 

x  xll  y 

lx  xlx  xlx  X 

x  xlo  y 

Figure  3.2.  Illustration  of  the  general  form  of  the  matrix  H 1 
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=  R~l  Hi 

=  Hi.  (3.6) 

Consider  the  matrix  Hl  whose  general  form  is  depicted  in  Figure  3.2.  Note  that  the 
matrn  resulting  from  the  above  permutation  is  Hl  when  /  =  q.  Note  also  that  all  non- 
W-elements  in  H 1  are  in  the  last  two  rows  in  columns  /  through  m  .  Our  objective  now  is 
to  reduce  Hi  to  a  LRQ  matrix  by  eliminating  these  non-W-elements.  We  consider  elim¬ 
inating  them  four  at  a  time  using  the  2x2  identities  on  the  diagonal  of  H1 .  The  necessary' 
matrices  that  should  reduce  Hl  to  Hl+2,  for  l=q,q  +  2,  •  •  •  /n- 3,  are  the  following  EULQ 
transformations. 


/-I  / 


By  repetitive  application  of  Zl  to  Hl ,  for  /  -q,q+2,  ■  ■  ■  ,m- 3,  we  get  Hm~]  which,  in 
general,  has  a  non-W-element  in  its  rn-l/n  entry  and  a  nonconforming  element  in  the 
m  ,m th  entry.  Therefore,  the  following  rank-one  elementary  transformation  is  sufficient  to 
reduce  Hm~]  to  the  LRQ  matrix  U', 


.  -  .  .  '.V  . 
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Zm-\  = 


I 

Um~\  /  LJm~\ 

nm-l,m  '  nm,m 

i /C1 

Theoretically,  „  is  a  nonzero  element,  since  otherwise  B  is  singular.  Now,  combin¬ 
ing  all  transformations  applied  to  //?,  we  obtain, 

Zm-\  Zm- 3  .  .  .  Zq  //?  = 

and  (3.6)  becomes, 

{Zm-\  Zm-2  .  .  .  Zq  ft- 1  Z~lJ  ft  (ft-\  ft  J  =  Zm-\  Zm-  3  ...  Z?  /y?  (3.7) 

{  Z~]  }  B  {R~l}  =  W, 

which  is  equivalent  to  the  required  updated  form  (3.4),  with 

Z  =Z  R  Z*'  ■■  ■  Zm~y'Zm~v\  (3.8) 

R  =  R~]  R ,  and 

\V  =  Zm'1  Zm~l  ■  ■  ■  Zi  H«. 

Note  that  Z  in  (3.8)  is  not  a  ULQ  matrix,  even  though  all  its  factors,  except  the  permuta¬ 
tion  matrix  R ,  are  ULQ  matrices.  In  practice,  Z-1  is  stored  factorized  as  in  the  first 
braced  term  in  the  left  hand  side  of  (3.7). 

Using  the  above,  the  updating  algorithm  may  be  stated  as  follows: 

Algorithm  3 .1  :  The  B.Q. I  F.  Updating  Algorithm 

0.  Begin  with  the  m  x  m  matrix  B  =  Z  W  R  ,  and  suppose  column  p  of 
B  is  replaced  by  A  } . 

1 .  Define  q  such  that  R  ^  -  eq  . 


2.  Set 


S.j  + 

I  W j,  otherwise. 


3.  Let 


4  = 


<7+1,  if  q  is  odd ; 
<7~1,  if  <7  is  even. 


4.  Set 


e, ,  1  <  i  <  <7 ; 
et  +2,  <7  <  i  <  m  - 1 , 
»  =  m-1; 

,  i  =  m. 


5.H  5  /?. 


6.  Let 


For  /  =<7  ,</+2,  •  ■  •  ,w  -3. 


j  <7 ,  if  <7  is  odd ; 

]  <7-1,  if  q  is  even. 


7.  Set 


Zii 


i=m- 1; 
~Hm,h  i-m\ 

0,  otherwise. 


Z/. 


/  +  ! 


1 ,  i  =/ + 1 ; 

-//m_u+1,  i-m-l\ 

~Hm,u l,  i=wj; 

0,  otherwise. 


Z  j  <r-ej,j^l  and y'W+L 


3 


8 .H  t —  Zl  H. 


Next  /. 


9.  Set 


Zm-l 
i  jn 


H m  - 1  jn  m  jn  ,  i  —m  1, 

1  !H m  jn  >  i  > 

0,  otherwise. 


Z™fl  <—  eJt  j*m. 


10.  H  t-Z1  H. 


11.  Set 


B  ={Z  R  (Z?)-1-  (Z"1-1)-1^  fZT1  /? } 


=  Z  IV  z?. 


This  updating  scheme  inherits  the  major  characteristics  of  the  Forrest-Tomlin 
update  for  the  triangular  factors  of  the  basis.  First,  no  new  nonzeros  are  created  in  the 
right  factor  IV,  since  only  deletions  of  items  are  required.  Therefore,  sparsity  of  W  is 
preserved  and  fill-in  is  minimized.  Second,  the  lack  of  choice  of  the  pivot  elements 
makes  this  update  less  numerically  stable  than  the  Bartels-Golub-based  updates.  Thus, 
there  is  a  gain  in  speed  and  storage  at  some  sacrifice  in  numerical  stability. 


*.  *,  •.  ■  % 


IV.  PARALLEL  IMPLEMENTATION 


In  this  section  we  describe  a  parallel  implementation  of  two  basic  tasks  of  any  sim¬ 
plex  based  linear  programming  code,  namely,  basis  reinversion  and  solution  of  the  linear 
systems.  A  parallel  version  of  the  Backward  Quadrant  Interlocking  Factorization  Algo¬ 
rithm  (BQIF)  is  presented  in  Section  4.1.  Only  the  left  factor  is  produced  in  its  product 
form  while  the  right  factor  is  produced  in  its  explicit  form.  This  form  conforms  with  the 
updating  scheme  of  Section  III.  In  this  algorithm,  parallelism  is  gained  by  reformulating 
the  BQIF  Algorithm  in  terms  of  high-level  modules  such  as  matrix-vector  operations. 
These  modules  represent  a  high  level  of  granularity  in  the  algorithm  in  the  sense  that  they 
are  based  on  matrix-vector  operations,  0(m2)  work,  not  just  vector  operations,  O(m) 
work.  The  module  concept  has  already  proven  to  be  very  successful  in  achieving  both 
transportability  and  high  performance  of  some  linear  algebra  routines  across  a  wide  range 
of  architectures,  as  reported  by  Dongarra  and  Sorensen  [1984-2]  and  Dongarra  and 
Hewitt  [1986-1]. 

Given  a  basic  feasible  solution  with  basis  B ,  each  iteration  of  Dantzig’s  simplex 
algorithm  involves  solving  the  systems  of  equations  n  B  =  cB  and  By=A_j.  An 
efficient  parallelization  of  the  simplex  algorithm  requires  efficient  parallel  algorithms  for 
solution  of  these  systems.  Parallel  algorithms  for  solving  these  linear  systems  using  the 
quadrant  factors  are  presented  in  Section  4.2.  The  parallel  implementation  discussed  in 
this  section  is  proposed  for  an  MIMD  parallel  computer  that  incorporates  p  identical  pro¬ 
cessors  sharing  a  common  memory  and  capable  of  multitasking,  that  is,  the  processors 
arc  capable  of  applying  all  their  power  to  a  single  job  in  a  timely  and  coordinated 
manner. 


4.1  The  Module-Based  BQIF  Algorithm 


Given  an  m  x  m  matrix  B ,  the  algorithm  either  indicates  singularity  of  B  or  pro¬ 
duces 

Zm-lZm-3...Zlfi  R  =Wf  (4.1) 

where  R  is  a  permutation  matrix,  Zk  is  a  rank-2  matrix  of  the  form, 


Zk  = 


k  *  +  l 


k 

*  +  l 


(4-2) 


This  form  conforms  with  the  updating  schemes  of  Section  III.  Its  LU  version  has  been 
used  in  several  LP  codes  (Reid  [1982-1])  .  At  every  stage  a  new  Z‘  is  produced  and  two 
rows  of  W  are  updated.  The  availability  of  the  updated  rows  of  W  at  every  stage  allows 
for  parallel  implementation  when  searching  for  a  nonsingular  2x2  submatrix.  Moreover, 
it  facilitates  finding  the  2x2  submatrix  of  largest  determinant  rather  than  finding  one  with 
a  nonzero  determinant.  This  reduces  the  rounding  error  in  the  factorization  process  and 
hence  improves  the  numerical  accuracy  of  the  results  (Shanehchi  and  Evans  [1982-1]). 

The  major  part  of  the  algorithm  is  formulated  in  terms  of  three  basic  modules: 

Module  1  :  Search  for  a  nonsingular  2x2  submatrix 
Input  :  A  eR2  n 

Purpose  :  Find  column  indices  j\  and  j 2  such  that 

DET  =  A]  j\  .  A  2ji  ~  A  2j  1  Axj2*0. 


Output  :  j  1,  j2,  and  DET  or  a  singular  indication. 


e 


* 


* 


* 
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Module  2  :  Matrix  -  2  vectors  product 
Input  :  v1  zRn''2,Al  e/?n',n\ x1  zRni'2. 

Purpose  :  Compute  y 1  such  that  y 1  y 1  +  A  1  x 1 . 

Output  : 

Module  3  :  2  vectors  -  matrix  product 
Input  :  y2  tR2'l\x2  zR2,l',A2  zR,ul1. 

Purpose  :  Compute  y 2  such  that  y2  <—  y2  +  x2  A2. 

Output  :  y2. 

These  modules  represent  a  high  level  of  granularity  in  the  algorithm  in  the  sense 
that  they  are  based  on  matrix-vector  operations,  O  (m2)  work,  not  just  vector  operations, 
O  ( m )  work. 


Call  Module  1  (A  ,n ). 

If  A  is  singular,  then  terminate  with  B  singular; 
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otherwise,  permute  columns 

VV’i :mj  with  W1:m,„  and  VV1:mjl+1  with  Wl:mjr 
Record  permutation,  /PVT (i  )=j IPVT (i+l)=j2- 

2.  Obtain  a  new  Z‘ . 

Z‘  /,  where  /  is  m  x  m, S/j(Z‘)  «-  [Si,,-(W')]_1. 
n  i  <—  m  -  i  -1 ,  n  2  *—  i  ~  1  • 

'  < — fij  +2:m ,1  :i  +  l »  ^  ^  :i  +  l* 

For  /  =  1,3,  ■  ■  •  ,i-2 

Aj;l  +  ]  <— Zi +2;m  ,/:/  +  l  • 

Next  / . 

Call  Module  2  (y 1  „r 1  ,/4  j,« 2) 

Z,  +2:m  .1  :i  + 1  *  3  • 

3.  Update  rows  i  +2, 1  +3  of  IF . 

/ 1  f-  <  + 1 ,  /  2  *—  m  -i + 1 . 

<4  -  ^  H  1  +  \ti  -f  *  y  ^  Pj  +  2:i  +3,i  +2:ffi  ■ 

For  /  =  1 ,3,  •  •  ,t 

+  l  •Zi+2:.+3,/:/  +  l  • 

Next  /  . 

Call  Module  3  (y2,x2y4  2,/ 1 ,/ 2) 

r  2 
' '  1  +2:i  +3,i+2:m  <"  >'  • 

Next  / . 

4.  Update  W . 

For  1  =1,3,  ■  •  •  ,m- 3 

<—  /,  where  /  is  2  x  2. 

IF,  ;i  +  l,i+2:m  Si.i(Z')  H|;i  +  l,i+2:m 


\KWyK\K\'  V  '."JUU  IWK  '.«J .«.  W,K>  JV  'JV  .V  ^  ^  ^  ,'«  . 
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Next  / . 

The  general  approach  we  propose  for  parallel  implementation  involves  having  the 
parent  processor  prepare  the  parameters  for  a  module  and  make  use  of  the  kids  (subtask 
processors)  to  work  concurrently  on  that  module.  In  Module  1,  at  most  rt(n-l)/2  column 
pairs  should  be  checked.  The  parent  sends  to  each  kid  the  column  indices  to  be  checked 
for  nonsingularity,  and  stops  all  kids  whenever  one  succeeds.  As  mentioned  before,  it  is 
possible  to  find  the  nonsingular  2x2  submatrix  of  largest  determinant.  To  do  this,  the 
parent  sends  the  column  indices  to  the  kids,  each  kid  finds  the  column  pair  of  largest 
determinant  in  his  list,  sends  them  to  the  parent,  then  the  parent  selects  the  best  by  com¬ 
paring  only  p  -1  values. 

The  concurrency  in  Modules  2  and  3  is  obvious  since  they  involve  matrix-vector 
operations.  In  Module  2  (matrix  -  2  vectors  product)  parallelism  is  obtained  by  perform¬ 
ing  2 n  ]  independent  inner  products,  where  n  j  is  the  row  dimension  of  the  matrix.  Simi¬ 
larly,  in  Module  3  (2  vectors  -  matrix  product)  concurrency  is  gained  by  executing  2/; 
independent  inner  products,  where  /2  is  number  of  columns  of  the  matrix.  Step  3  needs 
only  Z} +2.i  +i,i :i+\  from  Step  2.  These  are  the  first  two  rows  of  y].  Thus,  as  soon  as  these 
elements  become  available  Step  3  may  proceed.  This  can  easily  be  synchronized.  Finally . 
in  Step  4  the  loop  divides  over  i  with  completely  independent  tasks.  However,  the  tasks 
require  different  amounts  of  computation.  Two  solutions  are  possible.  Either  we  adopt 
dynamic  task  queue  allocation,  or  we  statically  allocate  i‘=l,m-3  to  one  processor. 
I  =3,  m  -5  to  the  second,  and  so  on. 

4.2  Solving  the  Linear  Systems 

In  this  section  we  investigate  the  possible  parallelism  involved  when  we  solve  the 


systems  of  equations  7t  B  =cD  and  By  =  A  We  assume  that  the  basis  matrix  B  is  in 
the  form  (4. 1 ),  that  is 


V  /V 


a."  ■  ^  v.’rirf-"rv-"’  ^  j.wv"*i  ■ 

v 

> 

K 
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Zm-lZm-3  ■  ■  Z'  B  R  =  W, 


where  ZA  has  the  form  (4.2),  and  W  is  a  block  unit  upper  triangular  matrix  with  blocks  of 
size  2, that  is  it  has  the  form  (2.36).  We  compute  the  dual  variables  (n)  using  the  follow  ¬ 
ing  steps: 

( 1 )  Permutation  :  n  =  cB R  . 

(2)  Solve  a  block  triangular  system  :  JtW  =  n. 

(3)  BTRAN  :  it  =  nZm^  Zm'3  •  •  ■  Z '. 

We  compute  y,  the  basis  representation  of  the  incoming  column  A  j,  as  follows: 

(1)  FTRAN  :  y  =  Zm~]  Zm~3  •  •  Z'A,j. 

(2)  Solve  a  block  triangular  system  :  \V  y  =  y . 

(3)  Permutation  :  y  =  R  y. 

We  present  parallel  implementations  of  the  FTRAN  operation,  the  solution  of  a 
block  triangular  system,  and  the  BTRAN  operation  in  Sections  4.2.1,  4.2.2,  and  4.2.3, 
respectively. 


4.2.1  The  FTRAN  in  Parallel 

The  rules  for  applying  a  ZA  to  an  arbitrary  vector  v  are  as  follows: 


a)  Extract  a*  <—  v* ,  and  a^+i  <—  v*  +  i. 

b)  Set  v*  <—  0,  and  v*+1  0. 

c)  Compute  v  =  v  +  a*  Zt.m,k  +  a*+i  Z£m,t+1 . 

Note  that  if  v*  =  v*+1  =  0,  then  v  =  v  and  no  element  of  v  will  change. 

An  example  is  now  given  for  m  =6 ,k  =  3.  Suppose  we  have 


Z  3:4  = 


■ 

* 

‘  " 

■ 

0 

0 

1 

11 

0 

0 

2 

12 

2 

1 

3 

,  and  u  = 

0 

1 

2 

,  v  = 

4 

0 

1/3 

1/4 

5 

15 

1/6 

1/2 

6 

16 

. 

_ 

L  J 

51 

% 

1 
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Then  the  computation  of  Z3v  is  given  by 


and  the  computation  of  Z3u  is  given  by 


These  rules  are  implemented  in  the  following  module: 

Module  :  FTRAN  Operation  (A  ,v  ,n ) 

Purpose  :  Apply  Zk  to  an  arbitrary  vector  v. 

Input  :  n,A  eRn'2,v  eRnA. 

Output  :  v,  where  v  =Zk  v. 

Steps  :  1 .  Extract  a;  f-  vj,  and  0t2  <-  v 2. 

2.  Set  v  1  <—  0,  and  v  2  0. 

3.  Compute  A  j  4-  a \A  j. 

4.  Compute  A  ,2  <—  a2  A  .2- 

5.  Compute  v  4-  v  +  A  +  A  2- 


Obviously,  steps  3  and  4  are  independent  and  can  be  executed  in  parallel.  In  step  5, 
the  work  is  partitioned  over  the  rows  of  v,  assigning  each  kid  a  block  of  rows  to  evaluate. 
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4.2.2  Solving  the  Block  Triangular  System 

The  solution  of  an  m  x  m  triangular  system  of  equations  on  a  sequential  computer 
can  be  obtained  by  either  a  forward  or  backward  substitution  process  which  requires 
0(m2)  steps,  each  defined  as  one  multiplication  followed  by  one  addition.  In  order  to 
solve  the  system  on  a  parallel  computer,  methods  which  require  0(m3)  processors  and, 
hence,  reduce  the  computation  time  to  O  (log2m )  have  been  developed  (  e.g.  Chen  and 
Kuck  [1975-1]  and  Sameh  and  Brent  [1977-1]  ).  Evans  and  Dunbar  [1983-1]  introduced 
methods  that  run  in  O  (m )  time  using  0  ( m )  processors.  For  practical  purposes  the  pro¬ 
cessor  and  storage  requirement  of  these  methods  is  unreasonably  large. 

In  this  subsection  we  consider  solving  the  linear  system 

xW=b,  (4.4) 

where  x  ,b  and  W  is  an  upper  triangular  m  x  m  matrix  with  2x2  identity  diagonal 
blocks.  This  system  may  be  solved  by  a  forward  substitution  (FS)  process  described  in 
algorithmic  form  as  follows. 

For  i  =  1,2,  •  •  •  ,m 

i-i 

T  =bi  -  '£wij  Xj. 
j= i 

Next  / . 

It  is  obvious  that  a  uniprocessor  will  solve  (4.4)  sequentially  in  m(m- 2)12  steps  by  the 
FS  process.  Let  Tp  denote  the  time  required  to  solve  (4.4)  using  p  processors,  where  one 
step  requires  one  unit  of  time.  Then 

T\-m  ( m  -2)12. 

With  a  parallel  computer  that  has  p  processors,  a  minimum  time  requirement  for  the 
solution  of  (4.4)  is 


min  (Tp  )  =  T\  /  p  =m  (  m  -2)1  (2  p  ). 


(4.5) 


The  minimum  completion  time  of  any  algorithm  based  on  FS  is  equal  to  the  number  of 
terms  in  the  expression  that  evaluates  xm,  that  is 

T  min  =  W  —  2. 

From  (4.5)  it  is  clear  that  a  minimum  of  m  12  processors  is  necessary  to  solve  (4.4)  in  the 
minimum  time  of  m-2  operations.  Again  this  processor  requirement  is  unreasonably 
large  for  our  application. 

The  machine  we  consider  has  a  limited  number  of  identical  processors  ( p  <30). 
Therefore,  we  consider  the  question:  if  we  are  given  a  fixed  number  of  processors,  how 
should  the  parallel  operations  be  scheduled  on  the  processors  to  minimize  the  solution 
time  of  (4.4)?  We  propose  to  answer  this  question  using  a  directed  graph  model  that 
represents  the  FS  process  as  follows.  The  nodes  of  the  graph  represent  tasks  of  equal  exe¬ 
cution  time  and  the  edges  represent  the  precedence  relationships  between  the  tasks.  Then 
we  apply  a  simple  scheduling  algorithm  due  to  Hu  [1961-1],  called  the  level  algorithm,  to 
schedule  the  tasks  on  the  processors  such  that  the  total  execution  time  is  minimized.  This 
algorithm  is  known  to  be  optimum  for  a  tree  graph,  and  it  gives  extremely  good  results 
for  general  graphs  as  reported  by  Ramamoorthy  et  ai  [1972-1],  Huang  and  Wing  [1979- 
1  ],  and  Wing  and  Huang  [1980-1]. 

We  first  organize  the  FS  process  in  terms  of  operations  of  equal  time  and  define  the 
corresponding  directed  graph.  Let  x‘  =  [a:,  ,x ;+!].  Partition  x,  b,  and  W  into  blocks  of 
size  2.  Using  5,  j  as  defined  in  (4.3),  the  above  FS  process  can  then  be  written  as 

For  /  =  1,3,  •  •  •  ,m- 1 
x‘=b‘-  £  x'S(i>(W). 

J=  U.  .1-2 

Next  / . 

Let  the  following  operation,  where  x‘  is  used  to  update  xi ,  define  a  task 


For  Ha’s  algorithm  we  assume  that  the  execution  time  of  an  operation  (4.5)  is  one  unit  (4 
multiplications  and  4  additions).  We  can  see  that  the  FS  process  consists  of  a  set  of 
operations  (4.5),  on  which  a  set  of  precedence  relations  exists.  That  is,  to  complete  the 
evaluation  of  x‘  we  require  x‘~ 2,  for  i  =  3,5,  •  •  ■  /n- 1.  The  process  can  therefore  be 
represented  by  a  directed  graph  G(V  ,E)  where  the  vertex  set  V  is  defined  as 

V={vij  lv,j  represents  an  operation  (4.6)J, 
and  the  edge  set  E  is  defined  as 

E={{Vij  ,  Vkj )  I  operation  vkj  requires  the  direct  result  of  operation  vlt]j. 

We  shall  call  G  (V  ,£)  the  forward  substitution  task  graph,  and  refer  to  it  by  FSTG. 
In  Figure  4.1  the  FSTG  for  m  =10  is  presented.  For  every  v,  j  in  the  FSTG,  the  pair  ij  is 
indicated.  A  node  is  an  initial  node  if  it  does  not  have  a  predecessor  and  is  a  terminal 
node  if  it  has  no  successor.  It  is  clear  that  the  FSTG  has  only  one  terminal  node,  at  which 
/  =  m  -3  and  j  =  m  -1 .  Accordingly,  the  minimum  completion  time,  denoted  by  D ,  of  the 
FSTG  is  equal  to  the  number  of  nodes  on  the  longest  path  from  an  initial  node  to  the  ter¬ 
minal  node.  Thus,  D  =  (m/2)  -  1,  which  is  the  number  of  times  operation  (4.6)  is  exe¬ 
cuted  for.r"1'1. 

We  next  determine  the  levels  of  the  vertices  of  the  FSTG.  Define  the  level  number 
(l,  j)  of  a  node  vi>;  as  follows:  1)  the  level  of  the  terminal  node  is  D ,  2)  the  level  of  a 
node  that  has  one  or  more  successors  is  equal  to  the  minimum  of  the  levels  of  its  succes¬ 
sors  minus  one.  Applying  this  definition  to  the  FSTG,  we  can  conclude  that 

/,.,=(/ +  l)/2.  (4.6) 

The  level  number  is  simply  the  latest  time  by  which  node  v,  j  must  be  processed  in  order 
to  complete  the  task  graph  in  the  minimum  time  D  .  The  level  numbers  of  the  nodes  of 
Figure  4.1  are  given  as  shown. 

Once  the  level  numbers  of  the  operations  are  determined,  we  apply  Hu’s  scheduling 


algorithm  to  assign  operations  to  processors.  Define  a  ready  task  to  be  one  whose 
immediate  predecessors  have  all  been  processed.  The  scheduling  algorithm  is  as  follows. 


Algorithm  4.2:  Hu's  Scheduling  Algorithm 

1.  Among  all  the  ready  tasks,  schedule  the  one  with  smallest  level  number. 

2.  If  there  is  a  tie,  schedule  the  one  with  the  largest  number  of  immediate  succes¬ 
sors. 


Applying  this  Algorithm  to  the  FS  process  represented  by  FSTG,  the  computations 
are  organized  as  follows. 

Algorithm  4.3  :  Forward  Substitution 

Set  x 1  <—  h '. 

For  k  =  3,5,  ■  ■  ■  ,m- 1 
xk  <r~bk  -Jt1  SlJc(W). 

Next  k . 

For  z  =  3,  •  •  ■  ,m  —3  . 

For  j  =1+2,/  +4,  •'  •  ■  ,m-l 
xJ  +-x>  -x'  S,j(W). 

Next  j . 

Next  i . 

All  operations  in  loop  k  are  independent  and  have  the  same  level  number.  Their 
level  number  (/,  k  =  1)  is  the  smallest  among  all  other  operations  in  the  Algorithm,  and 
hence  they  are  executed  first.  Similarly,  all  operations  in  loop  j  are  independent  and  have 
the  same  level  number  as  given  by  (4.6).  The  ordering  of  index  i  predicates  the  execution 
of  the  operations  by  increasing  level  number.  This  satisfies  the  first  criterion  in  Hu’s 
Algorithm.  The  second  criterion  imposes  the  ordering  of  the  index  j .  That  is,  the  number 
of  immediate  successors  of  v,  ,  is  always  greater  than  or  equal  to  that  of  v,  ,+2  for 
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9 


I 


/  =  2+2,14-4,  •  ■  •  ,m- 1. 


A  parallel  implementation  of  Algorithm  4.3  involves  having  the  parent  processor 
partition  the  work  in  loop  k  among  the  kids.  Then  for  every  i ,  the  computational  tasks  of 
loop  j  are  again  divided  among  the  kids. 


Lower  bounds  on  the  completion  time  of  a  task  graph  given  a  fixed  number  of  pro¬ 
cessors  were  derived  by  Ramamoorthy  et  al.  [1972-1].  Let  n *  be  the  number  of  nodes  in 
level  k .  Let  t*  (p )  be  the  minimum  completion  time  to  process  a  task  graph  with  p  pro¬ 
cessors.  Then 


t*  (p)>  max 

i 


2>* 

—  +  D  -  i 
P 


(4.7) 


w  here  D  is  the  minimum  completion  time  of  the  task  graph  and  [x]  denotes  the  smallest 
integer  >  .t .  The  first  term  in  the  expression  denotes  the  minimum  number  of  time  units 
required  to  complete  all  the  operations  of  the  first  i  levels  using  p  processors.  The  term 
D  -i  is  equal  to  the  number  of  remaining  levels  yet  to  be  processed.  This  bound  may  be 
useful  in  demonstrating  optimality  of  the  scheduling  using  Hu’s  Algorithm. 


4.2.3  Parallel  Implementation  of  the  BTRAN  Operation 

In  this  section,  we  consider  the  parallel  implementation  of  the  following  operation 

n  =  7 tZm_I  Zm-3  •  •  ■  Z1, 

where  n  is  an  arbitrary  vector  of  m  elements  and  each  Zk  is  an  m  x  m  rank-2  matrix  that 
has  the  form  (4.3). 

The  rule  for  computing  u  -  u  Zk  is  as  follows: 

a)  Set  i7(  +-  it;  for  i  *k  and  i  *k  + 1 . 

b)  Set  U £  < 

c)  Set  c  +  i  <  Wjk.m  Zfc:m,k  + 1  • 
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For  example,  let  m  =  6,  k  =3  and  suppose  we  have 


2 


3 

.,3:4 


0  0 
0  0 
2  1 
1  2 
3  4 
6  2 


.and  u  =  [  1  1  1  1  1  1  ]. 


Then  tT  =  «  Z*  =  [  1  1  12  9  1  1  ]. 


Note  that  u  differs  from  u  in  only  the  klh  and  the  jk+l"  elements.  Note  also  that  the 
elements  j=l,  •  •  •  ,k- 1,  are  not  required  in  computing  u.  Using  these  observations, 
the  BTRAN  process  may  be  represented  by  the  following. 


For  k  =  m  - 1 ,  •  ■  •  ,  1 

«*  <“  Z£:m.k. 

Hk  +  ]  *  ^k  '.m  Z£  m  jcjr\  . 

Next  k . 


We  now  apply  the  methodology  stated  at  the  end  of  the  previous  subsection.  Let  the  fol¬ 
lowing  operations  define  a  task 

uk  <r-uk  SkJk(Zk).  (4.8) 

ui  <r-  u>  +  u'  S,'j(Zj).  (4.9) 


We  assume  that  the  execution  time  of  both  operations  is  one  unit.  The  task  graph 
C  (V  ,E )  of  the  BTRAN  process  is  defined  by  the  vertex  set  V ,  where 


v^j  represents 


an  operation  (4.8),  if  i  =y ; 
an  operation  (4.9),  otherwise 


and  the  edge  set  E  ,  where 


E  ,  Vfc ./ )  I  operation  vk  l  requires  the  direct  result  of  operation  v,  j). 
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G  (l \E )  has  only  one  terminal  node  at  which  /  =  3  and  j  =  1.  Following  the  same  argu¬ 
ments  used  earlier  with  FSTG,  we  conclude  that 


and 


D  ~  m  I  2, 


1,  if  i  =j\ 

(m  -  i  +  3  )  /  2,  otherwise. 


Applying  Hu’s  Algorithm  to  the  BTRAN  task  graph  yields  the  following  ordering 
of  computations. 

Algorithm  4.4  :  BTRAN  Operation 
For  k  =  m-\,m  -3,  ■  •  ■  ,1 
uk  *-  uk  Sk'k{Zk). 

Next  k . 

For  i  =  m-l,m-3,  •  •  •  ,3 
For  j  =  i-2,t-4,  ■  •  •  ,1 
uj  uj  +  ui  Stj(Zi). 

Next  j . 

Next  / . 


The  ordering  of  the  index  i  is  imposed  by  the  first  criterion  of  Hu’s  Algorithm.  The 
ordering  of  the  indices  k  and  j  is  the  result  of  applying  the  second  criterion.  Parallelism 
is  gained  by  having  the  kid  processors  work  first  on  loop  k  in  parallel,  and  then  for  every 
i .  having  the  kid  processors  work  on  loop  j  in  parallel. 


I 
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V.  SUMMARY 


Evans  and  Hatzopoulos  [1979-1]  developed  a  new  matrix  factorization,  known  as 
the  Quadrant  Interlocking  Factorization  (QIF),  for  solving  linear  systems  on  parallel 
computers.  In  this  paper  we  have  presented  the  algorithms  required  to  use  this  new  fac¬ 
torization  in  Dantzig’s  simplex  algorithm  for  linear  programming.  This  work  may  be 
viewed  as  a  parallelization  of  the  simplex  method  using  a  quadrant  interlocking  factori¬ 
zation  for  the  basis  inverse. 

In  Section  II,  the  factorization  algorithms  are  developed,  and  the  relationship  of 
quadrant  and  triangular  matrices  is  presented.  In  Section  III,  a  new  algorithm  is  presented 
for  updating  the  factorization  during  a  basis  exchange  step.  In  Section  IV,  we  present  a 
parallel  implementation  of  the  factorization  algorithm,  and  develop  the  algorithms 
required  to  solve  the  linear  systems  of  the  simplex  method  on  a  parallel  computer  using 
the  QIF  of  the  basis.  For  each  algorithm  the  concurrency  among  the  steps  is  revealed,  the 
computations  are  organized  and  a  parallel  implementation  is  proposed.  The  algorithms 
are  designed  for  an  MIMD  parallel  computer  that  incorporates  p  identical  processors 
sharing  a  common  memory  and  capable  of  applying  all  their  power  to  a  single  applica¬ 
tion  in  a  timely  and  coordinated  manner. 
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ABSTRACT 


The  objective  of  this  investigation  is  to  computationally  test  parallel 
algorithms  for  finding  minimal  spanning  trees.  Computational  tests  were  run  on 

> 

a  single  processor  using  Prim's,  Kruskal's  and  Boruvka’s  algorithms.  Our 
implementation  of  Prim’s  algorithm  is  superior  for  high  density  graphs,  while 
our  implementation  of  Boruvka’s  algorithm  is  best  for  sparse  graphs.  Implemen- 

i 

tations  of  parallel  versions  of  both  Prim's  and  Boruvka's  algorithms  were 
tested  on  a  twenty-cpu  Balance  21000.  For  the  environment  in  which  a  minimum 
spanning  tree  problem  is  a  subproblem  within  another  algorithm,  the  parallel 
implementation  of  Boruvka's  algorithm  produced  speedups  of  three  and  five  on 
five  and  ten  processors,  respectively;  while  the  parallel  implementation  of 
Prim's  algorithm  produced  speedups  of  three  and  five  on  five  and  ten 
processors,  respectively.  The  one-time  overhead  for  process  creation  negates 
most,  if  not  all  of  the  benefits  for  solving  a  single  minimum  spanning  tree 
subproblem. 
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I.  INTRODUCTION 


The  United  States  along  with  other  developed  countries  is  entering  a  new 
generation  of  computing  that  will  require  software  engineers  to  redesign  and 
reevaluate  standard  algorithms  for  the  new  parallel  processing  hardware  that  is 
being  installed  throughout  the  developed  world.  It  may  well  be  that  algorithms 
which  proved  to  be  superior  for  single  processor  machines  may  prove  to  be 
inferior  in  some  of  the  new  parallel  processing  environments.  One  of  the  more 
popular  new  parallel  machines  is  Sequent  Computer  Systems*  Balance  21000.  The 
objective  of  this  investigation  is  to  computationally  test  parallel  algorithms 
for  finding  minimal  spanning  trees  on  a  twenty-cpu  Balance  21000. 

An  undirected  graph  G  =  [V,E]  consists  of  a  vertex  set  V  and  an  edge  set 
E.  Without  loss  of  generality  we  assume  that  the  edges  are  distinct.  If  G'  ** 
[V’.E1]  is  a  subgraph  of  G  with  V’  =  V,  then  G*  is  called  a  spanning  subgraph 
for  G.  If,  in  addition,  G'  is  a  tree,  then  G'  is  called  a  spanning  tree  for  G. 
A  graph  whose  components  are  trees  is  called  a  forest,  and  a  spanning  subgraph 
for  G,  which  is  also  a  forest,  is  called  a  spanning  forest  for  G.  We  will  call 
{ [ Vf rTi 3 :  =  {ui},  TA  =  0 ,  ui  e  V}  the  trivial  spanning  forest  for  G  and  the 

[Vi.Tf]  trivial  trees.  Associated  with  each  edge  (u,v)  is  a  real-valued  cost 
c(u,v).  The  minimum  spanning  tree  problem  may  be  stated  as  follows:  Given  a 
connected  undirected  graph  each  of  whose  edges  has  a  real-valued  cost,  find  a 
spanning  tree  of  the  graph  whose  total  edge  cost  is  minimum. 

Applications  include  the  design  of  a  distribution  network  in  which  the 
nodes  represent  cities  or  towns  and  the  edges  represent  electrical  power  lines, 
water  lines,  natural  gas  lines,  communication  links,  etc.  The  objective  is  to 
design  a  network  which  uses  the  least  length  of  cable  or  pipe.  The  minimum 
spanning  tree  problem  is  also  used  as  a  subproblem  for  algorithms  for  the 
travelling  salesman  problem  (see  Held  and  Karp  [6,  7]  and  Ali  and  Kennington 
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[3]).  Some  vehicle  routing  algorithms  require  the  solution  of  a  travelling 
salesman  problem  on  a  subset  of  nodes.  Hence,  a  wide  variety  of  applications 
require  the  solution  of  minimal  spanning  trees.  Some  applications  require  a 
single  solution  and  some  use  the  model  as  a  subproblem  within  another 
algorithm. 
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II.  THREE  CLASSICAL  ALGORITHMS 


t 


The  algorithms  in  current  use  may  be  traced  to  ideas  developed  by  Prim, 
Kruskal,  and  Boruvka.  These  three  classical  algorithms  all  begin  with  the 
trivial  spanning  forest  Gq  =  {[V^,T^],  i  *  0,...,  |  V  |-1).  A  sequence  of 
spanning  forests  is  obtained  by  merging  spanning  forest  components.  Given 
spanning  forest  G^,  a  nonforest  edge  (u,v)  is  selected  and  the  components 
[V^,T^]  and  [Vj,Tj]  with  u  e  V^  and  v  c  Vj  are  removed  from  G^  and  replaced  by 
f V£ «Tjj,] «  where  £  =  k  +  |V|,  V£  =  V±UVj,  and  Tf  =  T±U  Tj  U{(u,v)J,  yielding 
spanning  forest  G^^.  After  m  =  |  V  |  — 1  edges  have  been  selected,  Gm  = 
{[V2m.T2m]}  =  {[V,T]}  is  a  minimal  spanning  tree  for  G. 

Let  and  [ V j, T j ]  denote  two  disjoint  subtrees  of  G.  Define  djj, 

the  shortest  distance  between  the  trees,  by  d^j  =  min  (c(u,v):  (u,v)  £  E,  u  e 
V^,  v  c  Vj).  The  three  classical  algorithms  may  be  viewed  as  different 
applications  of  the  following  result: 

Proposition  1. 


Let  Vq,  Vj,...,Vn  denote  vertex  sets  of  disjoint  subtrees  of  a  minimum 
spanning  tree  for  G.  Let  c(u,v)  =  d^n  *  min  d^n  with  (u,v)  c  V  •  x  Vn.  Then 
(u,v)  is  an  edge  in  a  minimal  spanning  tree  for  G. 

A  proof  of  Proposition  1  may  be  found  in  Christofides  [4,  pp.  135-136]. 

In  Prim's  algorithm,  the  nonforest  edge  (u,v)  for  G^  is  always  selected  so 
that  (u,v)  t  Vi  x  Vj*,  where  j*  is  the  largest  index  j  such  that  [Vj.Tj]  e  G^.. 
Thus  a  single  component  continues  to  grow  as  trivial  trees  disappear.  An  ex¬ 
cellent  description  of  Prim's  algorithm  is  given  in  Papadimitriou  and  Steiglitz 
[15,  p.  273],  along  with  its  (serial)  computational  complexity  of  0(|v|^).  It 
is  believed  that  this  algorithm  is  best  suited  for  dense  graphs. 

In  Boruvka's  algorithm,  the  nonforest  edge  (u,v)  for  is  always  selected 
so  that  (u,v)  e  Vi+  x  Vj,  where  i*  is  the  smallest  index  i  such  that  [ V A , T ^ ]  e 
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Gr.  Thus  a  variety  of  different-sized  components  may  be  produced  as  the 
algorithm  proceeds.  All  trivial  trees  will  be  removed  first  in  the  early 
stages  of  this  algorithm.  A  description  of  Boruvka’s  algorithm  is  given  in 
Papadimitriou  and  Steiglitz  [15,  p.  277],  along  with  its  (serial)  computa¬ 
tional  complexity  of  0(|Ej  log  |v|).  This  algorithm  appears  to  be  best  suited 
for  sparse  graphs. 

Kruskal’s  method  may  be  viewed  as  an  application  of  the  greedy  algorithm. 
The  minimum  spanning  tree  is  constructed  by  examining  the  edges  in  order  of 
increasing  cost.  If  an  edge  forms  a  cycle  within  a  component  of  Gk,  it  is 
discarded.  Otherwise  it  is  selected  and  yields  Gk+1.  Here  also  different¬ 
sized  components  may  be  produced.  A  description  of  Kruskal's  algorithm  is 
given  in  Sedgewick  [18,  pp.  412-413],  along  with  its  (serial)  computational 
complexity  of  0(|E|  log  |e|). 


III.  COMPUTATIONAL  RESULTS  WITH  SEQUENTIAL  ALGORITHMS 


Computer  codes  for  Boruvka's  algorithm,  Kruskal's  algorithm,  and  three 
versions  of  Prim's  algorithm  were  developed.  SPARSE  PRIM  maintains  the  edge 
data  in  both  forward  and  backward  star  format,  while  DENSE  PRIM  maintains  the 
edge  data  in  an  |v|  x  (V|  matrix.  HEAP  PRIM  maintains  the  edge  data  in  both 
forward  and  backward  star  format  and  makes  use  of  a  d-heap  8S  described  in 
Tarjan  [19,  p.  77].  KRUSKAL  makes  use  of  a  partial  quick  sort  as  described  in 
[1,  8]  to  produce  the  least  cost  remaining  edge.  BORUVKA  is  a  straightforward 
implementation  of  the  algorithm  presented  in  [15]. 

Random  problems  were  generated  on  both  n  x  n  grid  graphs  and  on  completely 
random  graphs.  All  costs  were  uniformly  distributed  on  the  interval 
[0,  maxcost].  All  codes  are  written  in  FORTRAN  for  the  Balance  21000. 

The  computational  results  for  grid  graphs  are  presented  in  Table  1.  These 
graphs  are  very  sparse  and  BORUVKA  was  the  clear  winner.  The  computational 
results  for  random  graphs  may  be  found  in  Tables  2  and  3.  SPARSE  PRIM  was  the 
winner  for  problems  whose  density  was  at  least  40%  with  HEAP  PRIM  running  a 
close  second.  For  problems  with  densities  of  20%  or  less,  HEAP  PRIM  was  the 
winner  with  KRUSKAL  running  a  close  second.  KRUSKAL  appeared  to  be  the  most 
robust  implementation,  working  fairly  well  on  all  problems  tested. 


Tables  1,  2,  3  About  Here 


IV.  PARALLEL  ALGORITHMS 


Parallel  versions  of  the  three  classical  algorithms  have  appeared  in  the 
literature  (see  [2,  5,  9,  10,  11,  12,  16,  17]),  however;  no  computation 
experience  has  been  reported.  The  overhead  required  for  coordinating  the  work 
of  multiple  processors  can  only  be  determined  by  actual  implementation  on  a 
parallel  processing  machine. 

A  parallel  version  of  Boruvka's  algorithm  was  developed  for  grid  graphs 
and  a  parallel  version  of  Prim's  algorithm  was  developed  for  high  density 
random  graphs.  Both  algorithms  use  modules  (subroutines)  which  may  be  executed 
in  parallel.  Suppose  there  are  p  processors  available  for  use.  The  parallel 
operations  are  initiated  by  the  main  program  using  statements  of  the  form: 

for  m  =  1  to  p,  fork  module  z(m). 

The  main  program  and  p-1  clones  will  each  execute  module  z_  in  parallel. 
Processing  does  not  continue  in  the  main  program  until  all  processors  complete 
module  z,  The  argument  "m"  allows  each  of  the  p  processors  to  process 
different  parts  of  the  data  or  follow  a  different  path.  We  assume  that  all 
data  in  the  main  prc^.am  is  shared  with  module  z.  If  module  z_  has  local  non- 
shared  variables,  then  these  will  be  explicitly  stated  in  the  description  of 
the  module.  Multiple  processors  which  update  the  same  variable,  set,  or  list 
use  locks  to  insure  that  only  one  processor  has  access  to  a  given  item. 


c 
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4.1  Parallel  Boruvka  For  Grids 

Using  the  fork  and  lock  constructs  we  present  a  parallelization  of  Boruvka's 
algorithm  for  grid  graphs.  The  most  expensive  component  of  Boruvka's 
sequential  algorithm  may  be  described  by  the  following  procedure: 
for  all  (u,v)  e  E 

let  i  and  j  denote  the  subtrees  containing  u  and  v,  respectively; 
if  i  ^  j  then 

if  cost(u,v)  <  min(i)  then  rain(i)  4-  cost(u,v) 
if  cost(u,v)  <  min(j)  then  min(j)  4-  cost(u,v) 
end  if 
end  for 

That  is,  all  the  edge  costs  must  be  examined  and  certain  subtree  data  are 
updated.  Our  parallelization  of  this  scan  relies  upon  a  partitioning  of  the 
grid  into  p  components  (one  for  each  processor).  A  three  processor  par-i- 
tioning  of  a  7  x  7  grid  network  is  illustrated  in  Figure  1. 

Figure  1  About  Here 

The  above  edge  scan  is  performed  in  two  stages.  The  first  stage  performs 
a  parallel  scan  over  edges  both  of  whose  vertices  lie  within  the  same  partition. 
The  second  stage  performs  a  parallel  scan  over  edges  across  cut  sets.  If  each 
partition  consists  of  at  least  two  rows  of  the  grid,  then  all  subtree  data  up¬ 
dating  can  be  performed  independently  without  the  requirement  of  a  lock. 

The  second  part  of  Boruvka's  algorithm  is  to  merge  two  subtrees  by 
appending  a  new  edge.  The  merger  of  subtrees,  both  of  which  lie  in  the  same 
partition  can  also  be  executed  in  parallel. 

Using  this  data  partitioning  approach,  the  parallel  algorithm  may  be 
sr  ated  as  follows: 
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PARALLEL  BORUVKA  FOR  GRIDS 


Input:  1.  An  n  x  n  grid  graph  G  »  [V,E]  with  V  =  {v^,...,  v^}. 

2.  For  each  edge  (u,v)  e  E  a  cost  c(u,v). 

3.  The  number  of  processors,  p,  available  for  use. 

Output:  A  minimal  spanning  tree  [V,T]. 

Assumption:  G  is  connected  and  has  no  parallel  edges, 
begin 

T  <-  0 ,  r  4—  fn/pl,  £  4-  n  -  rp; 

If  r  <  2,  terminate. 

for  i  =  1  to  q,  Sj[  4-  {v± } ; 

C  4—  (S^ , . . . ,  Sq } ; 

Wx  4-  {v:  v  c  V  and  v  is  in  grid  rows  1  through  v  +  a } ; 
for  m  =  2  to  p, 

Wm  4-  (v:  v  t  V  and  v  is  in  grid  rows  (m-l)r  +  £  +  1  through  mr  +  i  }; 

for  m  =  1  to  p,  Xlm  4-  {(u,v):  (u,v)  e  E,  u  e  Wm,  and  v  e  Wm); 

for  m  «  1  to  p  -  1, 

X2m  ^u*v):  (u-v)  e  E  with  u  e  Wm,  v  e  Wffl+1  or  u  £  Wm+1,  v  £  Vm}; 
for  i  =  1  to  q,  cpu(i)  4-  m,  where  e  Wm; 

(comment:  S^,...,  Sq  are  assigned  to  the  p  processes) 
create  p-1  clones 

(comment:  create  p-1  additional  processes  and  place  them  in  the  wait 
state) 

while  j C |  f  1 

for  m  m  1  to  p,  fork  module  edgescan( 1 ,m) ; 

(comment:  forks  are  executed  in  parallel  and  processing  does  not  continue 
in  the  main  program  until  all  processes  complete  edgescan) 

for  id  «  1  to  p-1,  fork  module  edgescan(  2  ,m) ; 

L  4-  0; 


for  m  «  1  to  p,  fork  module  merge(m) ; 
for  all  (u,v)  £  L  do 


let  and  Sj  be  the  sets  containing  u  and  v,  respectively; 
if  | S± |  <  |Sj|  then 

si  4-  si U  Sj,  c  4-  C\S j ; 

else 

Sj  4-  Si  U  Sj ,  C  4-  C\Si; 
end  if 

T  4-  T  U(u,v); 
end  for 
end  while 
kill  the  clones 

end 

module  edgescan(k,m) 
begin 

(comment:  k  =  1  implies  the  scan  is  within  partition  m, 

k  =  2  implies  the  scan  is  across  the  cut  set  separating  partitions 
m  and  m  +  1) 

for  all  (u,v)  c  Xkm 

let  Si,  Sj  be  the  sets  containing  u  and  v,  respectively; 
if  i  #  j  then 

if  c(u,v)  <  min(i)  then  min(i)  4-  c(u,v),  shortest(i)  4-  (u,v); 
if  c(u,v)  <  min(j)  then  min(j)  4-  c(u,v),  shortest(j)  4-  (u,v); 
end  if 

(comment:  shortest(i)  is  the  least  cost  edge  incident  on  Si) 
end  for 

end 
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module  merRe(m) 
begin 


for  all  vk  E  Wm  do 
(u,v)  4-  shortest(k) 

^et  S^,  Sj  be  the  sets  containing  u  and  v,  respectively; 
if  i  t  j  then 

if  cpu(i)  =  cpu(j)  then 
if  | S-j^  |  <  |  Sj  I  then 

S±  4-  Si U  Sj,  C  4-  C\S j ; 
else 

Sj  4-  SiU  Sj,  C  4-  C\Si; 
end  if 
lock  T 

T  4-  T  U{(u,v)} 
unlock  T 
else 


lock  L 

L  4-  L  U{(u,v)} 
unlock  L 


end  if 
end  if 


end  for 


4.2  Parallel  Prim 


The  most  expensive  part  of  Prim's  sequential  algorithm  is  to  find  a 
minimum  entry  in  an  jv|  length  array.  This  search  can  be  allocated  over  p 
processors,  each  of  which  finds  a  candidate  minimum.  The  best  of  the  p  candidates 
becomes  the  global  minimum.  Under  the  assumption  that  parallel  edges  do  not 
exist,  there  is  also  a  scan  of  edges  over  the  forward  and  backward  star  of  a 
given  node  which  can  be  executed  in  parallel.  Data  partitioning  via  the  use  of 
independent  cut  sets  could  also  be  used  for  random  graphs  in  a  manner  similar 
to  that  described  in  Section  4.1.  That  has  not  been  done  in  this 
investigation. 

The  parallelization  of  Prim's  algorithm  may  be  stated  as  follows: 


PARALLEL  PRIM 


1.  A  graph  G  =  [V,E]  with  V  =  (v^,...,  vn). 

2.  For  each  edge  (u,v)  e  E,  a  cost  c(u,v). 

3.  The  number  of  processors,  p,  available  for  use. 


Output : 


A  minimal  spanning  tree,  [V,T]. 


Assumption:  G  is  connected  and  has  no  parallel  edges. 


begin 


V  4-  (vj),  w  4-  vj,  T  4-  0  ; 
for  i  =  1  to  n,  d(i)  4-  °°  ; 
create  p-1  clones 

(comment:  create  p-1  additional  processes  and  place  them  in  a  wait 
state) 

F  4-  ((w,v)  t  E); 

partition  F  into  mutually  exclusive  sets  Fj,...,Fs,  s  _<  p; 
for  m  =  1  to  s,  fork  nodule  f orwardscan(m) ; 


B  4-  ( ( u , w)  c  E); 
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partition  B  into  mutually  exclusive  sets  Bj,...,Bt,  t  <  p; 
for  m  =  1  to  t ,  fork  module  backward scan (ml : 
while  U  f  V  do 

globalmin  4-  CD  J 

for  n  *  1  to  p,  fork  module  nodescan(m) ; 

(comment:  forks  are  executed  in  parallel  and  processing  does  not 

continue  in  the  main  program  until  all  processes  complete 
nodescan) 

T  4-  T  U(e(ibest))f  U  4-  U  U{w); 

F  4-  {(w,v)  z  E); 

partition  F  into  mutually  exclusive  sets  Fj,...,  Fs,  s  <  p; 
for  m  =  1  to  s,  fork  module  f orwardscan(m) ; 

B  4-  {(u,w)  c  E) ; 

partition  B  into  mutually  exclusive  sets  Bj,...f  Bt ,  t  <  p* 

for  m  =  1  to  t,  fork  module  backwardscan(m) ; 
end  while 
kill  the  clones 


end 

module  nodescan(m) 
local  data:  min,  x 
begin 

min  4-  co 

for  i  *  b  to  n  step  p  do 

if  d(i)  <  min  then  min  4-  d(i),  x  4-  i; 
end  for 

lock  globalmin 

if  min  <  globalmin  then  globalmin  4-  min,  ibest  4-  x,  w  4-  v 
unlock  globalmin 
end 
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module  fowardscan(m) 
begin 


for  all  (u,v)  e  Fm  do; 

if  c(u,v)  <  d(v)  then  d(v)  4-  c(u,v),  e(v)  4-  (u,v) 
end  for 
end 

module  backwardscan(m') 
begin 

for  all  (u,v)  e  Bm  do; 

if  c(u,v)  <  d(u)  then  d(u)  4-  c(u,v),  e(u)  4-  (u,v) 
end  for 


end 


V.  COMPUTATIONAL  RESULTS  WITH  PARALLEL  ALGORITHMS 


Loth  algorithms  of  Section  IV  were  coded  in  FORTRAN  for  the  Balance  21000 
Iocs  ’  in  the  Center  for  Applied  Parallel  Processing  at  Southern  Methodist 
University.  The  Balance  21000  is  configured  with  twenty  NS32032  cpu's,  32 
Mbytes  of  shared  Memory,  and  16K  user-accessible  hardware  locks.  Each  cpu  has 
8  Kbytes  of  local  RAM  and  8  Kbytes  of  cache.  The  Balance  21000  runs  the  DYNIX 
operating  system,  a  version  of  UNIX  4.2bsd.  DYNIX  includes  routines  to  create, 
synchronize,  and  terminate  parallel  processes  from  C,  Pascal,  and  FORTRAN.  More 
details  about  the  Balance  21000  may  be  found  in  [13]. 

Table  4  gives  the  computational  results  with  Boruvka's  algorithm.  The 
times  are  wall  clock  times  and  are  the  average  for  three  runs.  The  first  row 
in  ea  h  table  contains  the  time  for  the  sequential  version  of  BORUVKA  and  all 
ot h‘vr  rows  contain  times  for  the  parallel  version.  The  sequential  version  is 
25  es  of  code,  while  the  parallel  version  required  over  400  lines.  The 
for  a  row  is  calculated  by  dividing  the  best  sequential  time  by  the 
t  i "  that  row. 

Initially,  the  parallel  code  creates  the  additional  processes  to  be  used 
an  :  -r.jires  each  of  them  to  build  data  tables  which  give  the  location  in 
virr  memory  of  all  shared  data.  Once  this  is  done,  the  processes  can  be 
us»i  repeatedly  with  little  system  overhead.  However,  this  initial  creation 
and  the  subsequent  killing  of  those  processes  at  termination  can  be  very 
expensive  for  this  type  of  problem.  The  first  column  of  times  includes  the 
creation  and  process  termination  time  while  the  second  does  not.  Hence,  if  a 
350  x  350  minimal  spanning  tree  was  to  be  obtained  one  time,  then  the  best 
speedup  is  2.6  using  seven  cpu’s.  If  however,  this  is  a  subprogram  of  a  larger 
system,  then  a  350  x  350  problem  can  yield  a  speedup  of  four  on  six  processors 
and  a  speedup  of  five  on  ten. 
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Table  4  About  Here 


Table  5  gives  the  computational  results  with  Prim’s  algorithm.  No  speedup 
is  achievable  for  a  one-time  solution.  For  environments  in  which  the  minimum 
spanning  tree  problem  is  a  subproblem,  speedups  of  three  and  five  were  obtained 
on  five  and  ten  processors,  respectively. 


Table  5  About  Here 


*  » 


VI.  SUMMARY  AND  CONCLUSIONS 


Five  computer  codes  were  developed  to  solve  the  minimum  spanning  tree 
problem  on  a  sequential  machine.  These  codes  were  computationally  compared  on 
both  grid  graphs  and  random  graphs  whose  densities  varied  from  5%  to  100%.  The 
implementation  of  Boruvka's  algorithm  (see  [15,  p.  277])  was  the  best  for  grid 
graphs.  An  implementation  of  Prim's  algorithms  using  a  sparse  data  representa¬ 
tion  (see  [15,  p.  273])  was  best  for  high  density  random  graphs  while  an  imple¬ 
mentation  of  Prim's  algorithm  using  a  d-heap  (see  [19,  p.  77])  was  best  for 
lower  density  random  problems.  Rruskal's  algorithm  using  a  quicksort  is  the 
most  robust  of  all  the  implementations,  ranking  either  second  or  third  in  all 
computational  tests.  Both  Boruvka's  and  Prim's  algorithms  were  parallelized  by 
the  method  of  data  partitioning  (also  called  homogeneous  multitasking).  This 
involves  creating  multiple,  identical  processes  and  assigning  a  portion  of  the 
data  to  each  processor.  For  the  environment  in  which  a  minimal  spanning  tree 
problem  is  a  subproblem  within  a  larger  system,  speedups  of  five  on  ten 
processors  were  achieved  with  both  Prim's  and  Boruvka's  algorithms.  The 
overhead  for  parallel  processing  on  the  Balance  21000  negates  most  of  the 
benefits  of  parallel  processing  for  the  first  solution  of  the  minimal  spanning 
tree. 
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Table  1.  Comparison  of  Sequential  Algorithms  on  Grid  Graphs 
(cost  range  is  0  -  10,000) 


1  Grid  Size 
{  o  x  n 

Edges 

Graph 

Density 

DENSE 

PRIM 

SPARSE 

PRIM 

HEAP 

PRIM 

RRUSRAL 

B0RUVRA 

15  x  15 

420 

1.72 

1.70 

.36 

.27 

.19 

.12 

18  x  18 

612 

1.22 

3.54 

.74 

.42 

.30 

.17 

20  x  20 

760 

1.02 

5.43 

1.10 

.54 

.39 

.21 

24  x  24 

1,104 

.72 

11.32 

2.19 

.82 

.63 

.30 

28  x  28 

1,512 

.52 

21.01 

4.09 

1.13 

.86 

.46 

30  x  30 

1,740 

.42 

27.82 

5.41 

1.37 

1.15 

.55 

|  Total  Time 

(sees. ) 

7o. e: 

13.89 

4.55 

3.52 

1.81  I 

1  Rank 

5 

4 

3 

2 

1  I 

Table  2.  Comparison  of  Sequential  Algorithms  on  High  Density  Random  Graphs. 

(cost  range  is  0  -  10,000) 


J  Vertices 

Edges 

Graph 

Density 

DENSE 

PRIM 

SPARSE 

PRIM 

HEAP 

PRIM 

KRUSKAL 

BORUVKA  | 

200 

19,900 

100% 

1.39 

1.14 

1.44 

1.52 

3.01 

200 

15,920 

80% 

1.39 

.97 

1.22 

1.52 

1.96 

200 

11,940 

60% 

1.39 

.79 

.99 

.96 

1.47 

200 

7,960 

40% 

1.39 

.61 

.76 

.89 

1.02 

400 

79,800 

100% 

5.67 

4.55 

5.42 

4.45 

12.03 

400 

63,840 

80% 

5.69 

3.85 

4.53 

3.58 

10.28 

400 

47,880 

60% 

5.70 

3.13 

3.62 

2.82 

7.26 

400 

31,920 

40% 

5.71 

2.49 

2.68 

1.97 

4.85 

600 

179,700 

100% 

13.28 

10.39 

11.98 

12.38 

29.85 

600 

143,760 

80% 

13.66 

8.79 

9.99 

14.99 

23.72 

600 

107,820 

60% 

13.16 

7.15 

7.99 

10.63 

17.79 

600 

71,880 

40% 

13.02 

5.55 

5.67 

6.05 

11.80 

|Total  Time  (secs.)  |  81.45  |  49.41  |  56.29  |  61.76  |  125.04 
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Table  3.  Comparison  of  Sequential  Algorithms  on  Low  Density  Random  Graphs. 

(cost  range  is  0  -  10,000) 


j  Vertices 

Edges 

Graph 

Density 

DENSE 

PRIM 

SPARSE 

PRIM 

HEAP 

PRIM 

KRUSRAL 

B0RUVKA  | 

200 

3,980 

20% 

1.40 

.44 

.49 

.50 

.52 

200 

1,990 

10% 

1.40 

.36 

.39 

.40 

.35 

200 

995 

5% 

1.39 

.32 

.32 

.35 

.17 

400 

15,960 

20% 

5.66 

1.75 

1.62 

1.47 

2.46 

400 

7,980 

10% 

5.71 

1.40 

1.12 

1.53 

1.30 

400 

3,990 

5% 

5.72 

1.21 

.86 

1.20 

.72 

600 

35,940 

20% 

13.04 

3.94 

3.39 

3.99 

6.02 

600 

17,970 

10% 

13.04 

3.05 

2.14 

2.89 

2.86 

600 

8,985 

5% 

13.07 

2.73 

1.50 

2.12 

1.52 

|Total  Time 

(secs. ) 

60.43 

15.20 

11.83 

14.45 

15.92  | 

Table  4.  Parallel  Boruvka  on  350  z  350  Grid  Graph 
|V|  -  122,500  |E|  -  244,300 

(cost  range  is  0  -  100,000) 


PARALLEL  BORUVKA 
(includes  process  creation) 


PARALLEL  BORUVKA 
(excludes  process  creation) 


speedup 


98.21 

1.00 

98.21 

1.00 

112.57 

.87 

103.86 

.95 

66.93 

1.47 

57.49 

1.71 

50.26 

1.95 

40.92 

2.40 

40.25 

2.44 

29.95 

3.28 

39.00 

2.52 

26.52 

3.70 

38.69 

2.54 

23.45 

4.19 

37.70 

2.60 

21.62 

4.54 

40.98 

2.40 

21.58 

4.55 

42.49 

2.31 

20.85 

4.71 

41.30 

2.38 

17.52 

5.61 
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Table  5.  Parallel  Prim  on  G  -  [V,E]  with  |v|  -  900  and  |E|  -  404,550 

(cost  range  is  0  -  100,000) 


cpu  s 


PARALLEL  PRIM 

(includes  process  creation) 

PARALLEL  PRIM 

(excludes  process  creation) 

time 

speedup 

j  time 

speedup 

24.88 

1.00 

24.88 

1.00 

27.09 

.92 

26.98 

.92 

23.35 

1.07 

15.12 

1.65 

22.63 

1.10 

10.84 

2.30 

25.31 

.98 

8.74 

2.85 

28.43 

.88 

7.39 

3.37 

31.54 

.79 

6.62 

3.76 

36.51 

.68 

6.03 

4.13 

41.08 

.61 

5.62 

4.43 

46.04 

.54 

5.30 

4.69 

50.54 

.49 

5.02 

4.96 

